Try the all-new QuickBooks Online for FREE.  No credit card required.


Document Sample
http Powered By Docstoc

   The Internet is a network that connects
    thousands of other computer networks.
   The Internet, simply, is a federation of
    computer networks that speak the same
   The Internet consists of thousands of
    networks worldwide.
Basic Principle

   Any point (node) on the network would have equal
    status and the capability to originate, pass and receive
   Messages would be divided into separately addressed
    packets starting from a specified source and
    transferring to a specified destination.
   The packets, which collectively form the message,
    would be sent individually, finding their own way to the
    destination. The route each packet took would not be
    important but the end result of a successfully
    communicated message was paramount.

   1965 Department of Defense’s Advanced
    Research Project Association (ARPA) sponsors
    research into a "cooperative network of time-
    sharing computer". (How could U.S. authorities talk
    to each other after a nuclear attack?)
   1969 Researchers at four US campuses create the
    first hosts of the ARPANET, connecting Stanford
    Research Institute, UCLA, UC Santa Barbara, and
    the University of UTAH with 56KB communications

   ARPA’s goals
       Allow multiple users to send and receive info at same
       Network operated packet switching technique
         Digital data sent in small packages called packets
         Packets contained data, address info, error-control info
           and sequencing info
         Greatly reduced transmission costs of dedicated
           communications lines
       Network designed to be operated without centralized
         If portion of network fails, remaining portions still able to
           route packets
Who is in charge?

   There is no single person or body in overall
    control of the Internet. There are agreed
    procedures for communication and various
    voluntary committees that set technical
   What is illegal under copyright, libel or
    obscenity laws in one country may be
    acceptable in another.

   The Internet Society (ISOC) is a voluntary
    organization with more than one hundred
    members and is regarded as having the
    greatest influence over how the Internet
    develops in the future.

   The Internet Architecture Board (IAB) is
    appointed by ISOC and has responsibility for
    technical management and direction of the
   ISOC controls the TCP/IP standard through
    both the IAB and its task forces. However,
    the process for change on the Internet is
    usually through documents called Request
    For Comments (RFCs).

   The Internet Network Information Center
    consists of three organizations known
    collectively as InterNIC. InterNIC is also the
    registrar for domains and network numbers
    for the Internet.
   WWW
       The World Wide Web (WWW) was developed by Tim
        Berners-Lee and other research scientists at CERN, the
        European center for nuclear research, in the late 1980s
        and early 1990s.
       WWW is a client-server model and uses TCP connections
        to transfer information or web pages from server to client.
       WWW uses a Hypertext model. Hypertext allows
        interactive accesses to a collection of documents.
       Documents can hold
           Text (hypertext), Graphics, Sound, Animations, Video
       Documents are linked together
           Non-distributed – all documents stored locally (e.g on
           Distributed – documents stored at remote servers on
            the Internet.
   WWW - Hyperlinks (or links)
       Each document contains links (pointers) to other
       The link represented by "active area" on screen
           Graphic - button
           Text - highlighted
       By selecting a particular link, the client fetches the
        referenced document from a server for display.
       Links may become invalid.
       Link is simply a text name for a remote document.
       Remote document may be moved to a new location while
        name in link remains in place.
   WWW – Document Representation
       Each WWW document is called a page.
       Initial page for individual or organization is called a home
       Page can contain many different types of information; page
        must specify:
         Content – The actual information
         Type of content – The type of information, e.g. text, pictures
         Links to other documents
       Rather than having a fixed representation for every
        browser, pages are formatted with a mark up language.
       This allows browser to format page to fit display.
       Different browsers can display pages in different ways.
       This also allows text-only browser to discard graphics for
       Standard is called HyperText Markup Language (HTML).
   WWW – HTML
       HTML specifies
           Major structure of document
           Formatting instructions for browsers to execute.
           Hypertext links – Links to other documents
           Additional information about document contents
       Two parts to document:
           Head contains details about the document.
           Body contains the information/content of the
       Each web page is represented in ASCII text with
        embedded HTML tags that give formatting instructions to
        the browser.
           Formatted section begins with tag, <TAGNAME>
           End of formatted section is indicated by </TAGNAME>
   WWW – HTML Example
  <TITLE> Example Page for lecture</TITLE>
Lecture notes for today go here!
  <TD><A HREF="./lecture10.html">Previous Lecture</A>
  <TD><A HREF="./lecture12.html">Next Lecture</A>
  <TD><A HREF="./Contents.html">Table of contents</A>
  <TD><A HREF="./solutions.html">Solutions to Assignments</A>
  <TD><A HREF="./index.html">Index of terms</A>
   WWW – Other HTML Tags
       Headings - <H1>, <H2>
       Lists
           <OL> - Ordered (numbered) list
           <UL> - Unordered (bulleted) list
           <LI> - List item
       Tables
           <TABLE>, </TABLE> - Define table
           <TR> - Begin row
           <TD> - Begin item in row
       Parameters
           Keyword-value pairs in HTML tags
           <TABLE BORDER=3>
   WWW – Embedding Graphics
       IMG tag specifies insertion of graphic
         Parameters:
         SRC="filename"

         ALIGN= - alignment relative to text

       <img SRC=“GCD.gif" height=35 width=30>
       The above line would insert the image in the file GCD.gif
        into any web page.
       Image must be in format known to browser, e.g., Graphics
        Interchange Format (GIF), Joint Photographic Experts
        Group (JPEG), Bitmap etc
•WWW – Style
The layout and format of an HTML document can be
simplified by using CSS (Cascading Style Sheets)
    <style type="text/css">
    body {background-color: yellow}
    h1 {background-color: #00ff00}
    h2 {background-color: transparent}
    p {background-color: rgb(250,0,255)}
                                           <h1>This is header 1</h1>
                                           <h2>This is header 2</h2>
                                           <p>This is a paragraph</p>

   WWW – Identifying a web page
       A web page is identified by:
         The protocol used to access the web page.
         The computer on which the web page is stored.

         The TCP port that the server is listening on to allow a client
          to access the web page.
         Directory pathname of web page on server.

       Specific syntax for Uniform Resource Locator (URL):
         Protocol can be http, ftp, file, mailto.

       Computer name can be DNS name or IP address.
       TCP port is optional (http uses port 80 as its default port).
       document_name is path on server to web page (file).
   WWW – Identifying a web page
       E.g.
       Protocol is http
       Computer name or DNS name is
       Port number is the default port for http, i.e. port 80.
       Document name is /Recreation/Sports/Soccer/index.html
   WWW – Hyperlinks between web pages
       Each hyperlink is specified in HTML by using a special tag.
       An item on a page is associated with another HTML
       Each link is passive, no action is taken until link is selected.
       HTML tags for a hyperlink are <A> and </A>
       The linked document is specified by parameter to the tag:
        HREF="document URL"
       <A HREF=“”>Click here to go to GCD
        web site.</A>
       Whatever is between the HTML tags, <A> and </A> is the
        highlighted hyperlink.
   WWW – Client Server Model
       The browser is the client, WWW (or web) server is the
       Browser:
         The browser makes TCP connection to the web server.
         The browser sends request for the particular web page that
          it wishes to display.
         The browser reads the contents of the web page from the
          TCP connection and displays it in the browsers window.
         The browser closes the TCP connection used to transfer
          the web page.
       Each separate item in a web page (e.g., pictures, audio)
        require a separate TCP connection.
       HyperText Transport Protocol (HTTP) specifies commands
        that the client (browser) issues to the server (web server)
        and the responses that the server sends back to the client.
•WWW – Client Server Model

         Figure 1-1: Web client/server architecture
Web Server Basics
   Duties
       Listen to a port
       When a client is connected, read the HTTP
       Perform some lookup function
       Send HTTP response and the requested data
 Serving a Page
     User of client machine types in a URL

             client                  server
           (Netscape)               (Apache)
  Serving a Page
     Server name is translated to an IP address
      via DNS

              client                    server
            (Netscape)                (Apache)

http:// /index.html

 Serving a Page
     Client connects to server using IP address
      and port number

             client           server
           (Netscape)                 port 80      (Apache)

 Serving a Page
     Client determines path and file to request

             client                   server
           (Netscape)                (Apache)
 Serving a Page
     Client sends HTTP request to server

             client                 GET index.html HTTP/1.1    server
           (Netscape)                                         (Apache)
  Serving a Page
     Server determines which file to send

             client                          server
           (Netscape)                       (Apache)
                                    "index.html" is really
  Serving a Page
     Server sends response code and the

                                  HTTP/1.1 200 OK
             client            Content-type: text/html     server
           (Netscape)                                     (Apache)
                               [contents of index.html]
Serving a Page
   Connection is broken

        client              server
      (Netscape)           (Apache)
   HTTP is…
       Designed for document transfer
       Generic
           not tied to web browsers exclusively
           can serve any data type
       Stateless
           no persistant client/server connection
HTTP Protocol Definitions
   MIME
       Multipurpose Internet Mail Extensions
       Standards for encoding different media types
        in a message
       Originally developed for emailing files and
        messages in different languages
   WWW – HTTP Protocol
       When a user types in,
        the broswer creates a HTTP GET Request message and
        sends it over a TCP connection to the web server.
       In the above case, the HTTP GET Request message
        would be

    GET /Recreation/Sports/Soccer/index.html HTTP/1.0
    User-Agent: InternetExplorer/5.0
    Accept: text/html, text/plain, image/gif, audio/au
   WWW – HTTP Request messages
       HTTP Request messages are sent from client to server.

                      Request Line     Optional HTTP Header   “\r\n”   Optional Data

                               Additional information           Delimiter
                               such as brower being             Carriage return    User data e.g.
    Type of Request                                                                contents of
                               used, media types accepted       Line feed
    (e.g. GET)                                                                     completed form
       There are a number of valid HTTP Request messages
         Get – Used to request a web page from a web server
         Head – Return the header of a web page, used by search
          engines to test the validity of hyperlinks
         Post – Used to send data (e.g. results of registration form)
          to a web server
         Put / Delete – Not typically implemented by browsers.
   WWW – HTTP Response messages
       HTTP Response messages are sent from server to client.

                      Status Line      Optional HTTP Header   “\r\n”      Optional Data

    Success/Failure            Type of content returned
    Indication                 e.g. text/html or image/gif                            Requested Data
                                                                       Delimiter      e.g. web page
    Number between
    200 and 599

       The Status Line gives information about the success of the
        previous HTTP Request
         200 – 299     Success
         300 – 399     Redirection – Document has been moved
         400 – 499     Client Error – Bad Request, Unauthorised,
          Not found
         500 – 599     Server Error – Internal Error, Service
          Overloaded - - [10/Jan/2004:19:22:09 +0000] "GET /cmt3092 HTTP/1.1" 301 366 - - [10/Jan/2004:19:22:10 +0000] "GET /cmt3092/ HTTP/1.1" 200 343 - - [10/Jan/2004:19:22:17 +0000] "GET /cmt3092/xampp-win321.2.exe HTTP/1.1" 404 1243 - - [10/Jan/2004:19:22:47 +0000] "GET /cmt3092/xampp-win321.2.exe HTTP/1.1" 404 1117 - - [10/Jan/2004:19:23:29 +0000] "GET /cmt3092/ HTTP/1.1" 200 344 - - [10/Jan/2004:19:23:36 +0000] "GET /cmt3092/xampp-win32-1.2.exe HTTP/1.1" 200
28329331 - - [10/Jan/2004:19:33:08 +0000] "GET /cmt3092 HTTP/1.1" 301 366
   WWW – Caching Web pages
       Downloading HTML documents from servers can be slow
        due to a number of conditions:
         Parts of the Internet can be congested

         Dialup connection is typically very slow, 33Kbps or 56Kbps

         Web server can have a lot of clients connecting to it at the
           same time, causing it to be overloaded.
       If a user returns to previous HTML document, then this
        could require downloading the document from the server
       A browser can hold copies of recently visited pages. This
        avoids having to download pages again.
       An organisation can use a HTTP proxy that caches
        documents for multiple users. Thus improving the speed at
        which pages can be displayed on each users computer.
   WWW – Browser Architecture

Input from                                     html
                                                          s   Output sent to
keyboard and                                interpreter   p
mouse               Controller                   …            display
               HTTP          Other                        i
                        …    client                       v
                Network Interface

                            Communication with
                            remote server
   WWW – Browser Architecture
       Browser has more components than a server:
           Display driver for painting screen.
           HTML interpreter for formatting HTML documents.
           Plugins to display different content (e.g., Shockwave
            or Real Audio content)
           HTTP client to fetch HTML documents from WWW
           Other clients for other protocols (e.g., ftp, mail)
           Controller also must accept input from the computer
            user through the mouse or keyboard.
                         Other Protocols
   FTP - File Transfer Protocol
       The Internet began development in the 1960s.
       Moving a file from one computer to another computer
        required some form of removable medium (floppy disk or
       People required a protocol to reliably transfer files between
        any two computers connected to the Internet.
       Why not use HTTP?
           The HTTP protocol was developed in the late 1980s
            and the early 1990s.
           HTTP provides a poor authentication mechanism of
            users of the protocol.
           HTTP doesn’t easily allow files to be sent in both
           HTTP doesn’t allow files to be downloaded in separate
   FTP - Functions
       The main function of FTP was to allow the sharing of files
        across the Internet.
       Other functions included
           Allowing computer users to use computers remotely.
           Hiding file storage differences from the user. The
            format that files are stored on a Macintosh are
            different from a PC which in turn are different from a
            Unix workstation. Different length filenames also have
            to be accommodated.
           Transfer of file data between computers has to be
            done reliably and efficiently. FTP should also allow
            transfer of very large files to be done in stages.
   FTP
       FTP is a client/server program
       An FTP client program enables the user to interact with an
        ftp server in order to access files on the ftp server
       Client programs can be:
         Simple command line interfaces. E.g. MS-Dos Prompt
                   C:\ ftp
         Integrated with Web browsers, e.g. Netscape Navigator,
            Internet Explorer.
       FTP provides similar services to those available on most
        filesystems: list directories, create new files, download
        files, delete files.
       FTP uses TCP connections and the default server port for
        FTP is 21.
   FTP - Transfer modes
     Batch transfer
           User creates list of files to be transferred by ftp
           Users request is dropped into a queue of similar
           FTP program reads requests and performs transfers
            of files.
           Transfer program can retry until successful.
           Good for slow or unreliable transfers.
       Interactive transfer
           User starts ftp program
           User can interactively list contents of directories,
            transfer files, delete files etc.
           User can find and transfer files immediately
           Quick feedback in case of mistakes, e.g., spelling
   FTP - Sample Commands
       Command                  Description
        ftp         Open connection to computer
        ls               List Directory contents
        cd                       Change to another directory
        bin                      Change to binary transfer, used for
                                 downloading executables.
        get                      Download a file from remote
        put                      Upload a file to the remote
        mget                     Start download of multiple files
        mput                     Start upload of multiple files
   FTP - Checkpointing
       A data transfer may be aborted after only transferring part
        of a file.
         This could be due to the client or the server crashing, the
           TCP connection being broken due to congestion, phone
           hanging up during dial up connection.
       FTP allows the file transfer from where the transfer was
        stopped, no need to re-transfer part of file.
       FTP achieves this by sending restart markers between
        the server and the client.
       Restart markers are saved in a restart file by the client.
        Client sends restart marker when it wants to continue the
        transfer of a previously stopped transfer.

   Title – an embedded description provided by the
    document designer; viewable in the titlebar (it is
    also used as the description of a newly created
    bookmark by most browsers)
   Description – a type of metatag which provides
    a short, summary description provided by the
    document designer; not viewable on the actual
    page; this is frequently the description of the
    document shown on the documents listings by
    the search engines that use metatags

    Keywords – another type of metatag
     consisting of a listing of keywords that
     the document designer wants search
     engines to use to identify the document.
     These too, are not viewable on the actual
    Body – the actual, viewable content of
     the document.

   Order a keyword term appears –
    keyword terms that appear sooner in the
    document’s listing or index tend to be
    ranked higher
   Frequency of keyword term – keywords
    that appear multiple times in a document’s
    index tend to be ranked higher

   Occurrence of keyword in the title – keywords
    that appear in the document’s title, or perhaps
    metatag description or keyword description fields,
    can be given higher weight than terms only in the
    document body
   Rare, or less frequent, keywords – rare or
    unusual keywords that do not appear as
    frequently in the engine’s index database are
    often ranked more highly than common terms or