How the World Wide Web Works by mrobin5490


                                                   S What is the WWW?
 How the World Wide Web Works -
                                                   S Who uses it and what for?
             HTTP                                  S How the WWW works - http
               Boriana Koleva
                 Room C54

  What is the World Wide Web
                                                           History of the WWW
S It is not the Internet!                          S 1993 - NCSA released Mosaic
S It is the face of the Internet                   S 1994 - Mosaic Corporation was began
S Invented in the late 1980s at CERN                 S later became Netscape Communications
  S to enable scientists to exchange information       Corporation
  S first text-only web software in 1991           S Mosaic development stopped once Netscape
S Hypertext invented earlier                         became dominant on the market
  S basic principles 1945                          S Netscape is now competing with
  S first implementations 1960s                      Microsoft’s Internet Explorer (IE)

   How does the WWW work?                           How does the WWW work? (2)
S The web is a huge collection of data stored      S Information is requested primarily by
  at many different computers on the Internet        hypertext links
S WWW uses client/server approach                    S contain Uniform Resource Locators (URLs)
                                                        • usually embedded http requests
S Preferred format is HyperText Markup
                                                        • The http requests are sent to the server named in the
  Language (HTML)                                         URL
  S designed as an interchange format                   • The server locates the named file and sends it back
  S all Web browsers should be able to correctly        • The http request contains all the information needed
    process this                                          to locate the appropriate server and tell it what to
                                                          send back

               Who uses the Web?                                             The Web’s Publishing Model
 S Why put information on the Web?                                        S Anyone, anywhere can put anything on the Web
                                                                             S Need file space on a Web server and a small amount of
 S Who wants to set up and manage servers                                      knowledge
   and why?                                                               S No “editor” - unvetted
      S The WWW as an academic resource                                      S therefore quality could be unknown
      S Commerce on the WWW                                                  S some sites might be more trusted - unless hacked!
      S The Web’s publishing model                                        S Obscurity

                             HTTP                                                         HTTP Headers
 S   Hypertext Transfer Protocol                                          S For transferring information between the
 S   Used by Web clients (browsers) and servers                             client and server
 S   Fundamental to almost every Web request                              S Four categories:
 S   The main protocol of the Web                                            S General, Request, Response, Entity

 S   All http transactions have the same general                          S Format:
     format:                                                                 S header name, colon, space, value of header
      S request or response line; header section; entity                     S case-insensitive
        body                                                                 S e.g. Content-type: text/html

                 Client’s request                                                       Server’s response
                                                                          S Server replies with a status line
S Client contacts server and sends document
                                                                             S information on http version, status code,
  request:                                                                     description
     S http command called a method                                             e.g. HTTP/1.1 200 OK
     S document address                                                   S Server then sends header information
     S http version number                                                   S tells client about the server itself - software
       e.g. GET /index.html HTTP/1.1
                                                                               name, document type, when last modified, etc.
S Client may then send some header information                                  Date: Mon, 5 Feb 2001 08:12:56 GMT
       User-Agent: Mozilla/4.05(WinNT; I)                                       Server: NCSA/1.5.2
       Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*         Last Modified: Thu, 7 Sep 2000 21:52:32 GMT
                                                                                Content-type: text/html
S Client can then optionally send additional data                               Content-length: 2482

                  Server’s response (2)                                                                 HTTP methods
   S If successful request, server sends document                                      S A method is a way of instructing the server
     or data requested                                                                   how to respond
   S If unsuccessful, server might send                                                     S   GET
     additional information for the user                                                    S   HEAD
   S HTTP 1.0                                                                               S   POST
        S Connection: Keep Alive header                                                     S   LINK and UNLINK         Less widely
   S HTTP 1.1                                                                               S   TRACE                   supported by
                                                                                            S   PUT and DELETE
        S default - server maintains connection

                      The GET method
GET /index.html HTTP/1.0
Connection: Keep-Alive                                              Client Request
                                                                                                     The HEAD method
User-Agent: Mozilla/2.02Gold (WinNT; I)
Host:                                                                S Functionally like GET
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
                                                                                       S Used to get information about a document
                                       HTTP/1.0 200 Document follows
                                       Date: Fri, 20 Sep 1998 08:17:58 GMT
                                                                                            S   modification time
                                       Server: NCSA/1.5.2                                   S   document size
                                       Last-modified: Mon, 17 Jun 1998 21:53:08 GMT         S   type of document
                                       Content-type: text/html
                                                                                            S   type of server
             Server Response           Content-length: 2482

                                       (body of document here)

                    The HEAD method
 HEAD /index.html HTTP/1.1
 Connection: Keep-Alive                                              Client Request
                                                                                                      The POST method
 User-Agent: Mozilla/2.02Gold (WinNT; I)
 Host:                                                              S Allows data to be sent to the server
 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*                       S data is in the entity-body section of client’s request
                                       HTTP/1.1 200 Document follows
                                                                                        S data directed to data handling program
                                       Date: Fri, 20 Sep 1998 08:17:58 GMT
                                       Server: NCSA/1.5.2                             S Applications include:
                                       Last-modified: Mon, 17 Jun 1998 21:53:08 GMT     S   network services
                                       Content-type: text/html
                                                                                        S   command-line interface programs
               Server Response         Content-length: 2482
                                                                                        S   annotation of documents on the server
                                       (no entity body sent in response)                S   database operations

                     The POST method                                                                         Other methods
                                                                                       S LINK
POST /cgi-bin/ HTTP/1.0
                                                                                           S requests that header information is associated with a
User-Agent: Mozilla/2.02Gold (WinNT; I)                            Client Request            document on the server
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
                                                                                       S PUT
                                                                                           S requests that the entity-body of the request is stored at
Content-type: application/x-www-form-urlencoded
                                                                                             the specified URL
Content-length: 20
                                                                                       S TRACE
month=august&date=24                                                                       S requests that the entity-body be returned intact
                                                                                       S DELETE
                                                                                           S requests the removal of data at a URL on the server

                                                                                                                                       200 OK
                                                                                       100 Continue
                                                                                                                                       201 Created
                                                                                       101 Switching Protocols
  Server responses and status codes                                                                                                    202 Accepted
                                                                                                                                       204 No Content

                                                                                          Informational                              Client request successful
  S Status codes typically generated by web servers
       S but can be generated by CGI scripts                                                                     Status Codes
         Code Range            Response Meaning                                      300 Multiple Choices
                                                                                                                                    500 Internal Server Error
                                                                                     301 Moved Permanently
                                                                                                                                    501 Not Implemented
         100-199               Informational                                         302 Found
                                                                                                                                    503 Service Unavailable
                                                                                     304 Not Modified
         200-299               Client request successful                                                                            505 HTTP Version not
         300-399               Client request redirected, further                   Redirection              400 Bad Request
                                                                                                             402 Payment Required
                                                                                                                                                        Server errors
                               action necessary                                                              403 Forbidden
         400-499               Client request incomplete                                                     404 Not Found

         500-599               Server errors                                                              Client request incomplete

                          State in HTTP                                                                  State in HTTP (2)
  S In http version 1.0, server disconnects from                                       S In http version 1.1, the server by default
    the client after completing request                                                  keeps the connection open
       S therefore no logging on and off                                                   S additional requests can be made
       S the http server does not remember anything                                        S client or server usually has to explicitly close it
         between requests                                                              S BUT http is a stateless protocol
       S separate requests for each image in a page                                        S server does not keep a record of client’s
                                                                                             previous activities
                                                                                           S every client request has to contain all the
                                                                                             necessary information

             State in HTTP (3)
                                                      S Designed to enable subsequent client requests to be
S C.f. telnet and FTP - both stateful                   “better” serviced and state to be preserved between
S HTTP being stateless has advantages:
                                                      S Cookies allow server to write information to the
  S a lot more clients can be served because no         client’s machine
    overheads in tracking                                 S Created when a client visits a new site
S Disadvantage:                                           S Stored in a “cookies” file
  S when it is necessary to maintain state between        S When client re-visits site, it checks its cookies file for any
    sessions, tricks such as cookies are needed             information about that URL
                                                          S If there is information, client includes a cookie header in
                                                            its request to the server

                 Cookies (2)
S Disadvantages:                                                             Summary
  S Misuse
     • unknown or mistrusted sources                  S   What is the WWW?
  S Information gathering                             S   How does it work?
     • targeted advertising
                                                      S   Who uses it?
     • selling to information brokers and marketing
       companies                                      S   HTTP
S Solution                                            S   State in HTTP
  S tailor browser to refuse cookie information or    S   Cookies
    make file read only
  S but lose some functionality


To top