WWW Servers

Document Sample
WWW Servers Powered By Docstoc
					              WWW Servers

                         Miroslav Milinovic
          Croatian Academic and Research Network - CARNet
                           Zagreb, Croatia

5th CEENet Workshop on Network Technology, Budapest, Hungary, August 1999.

•   How WWW works?
•   What is the WWW server?
•   Apache
     – directory structure; configuration files & directives; running
     – access control; authentication
     – Common Gateway Interface (CGI); passing data
     – Server side includes (SSI)
     – API: modules & handlers
     – virtual servers
•   Log analysis tools
•   Sources
•   Future

               How WWW works?
                        Internet      WWW servers


users browse

                                       HTML files

                 authors write HTML
How WWW works?



      What is the WWW server?
•   general purpose data delivery vehicle
•   a program (daemon, httpd):
    –   responds to an incoming TCP connection
        and provides a service to the client
    –   runs independently
•   WWW servers:
    –   do NOT validate HTML code (parse documents)
    –   do NOT check links
    –   follow MIME rules (without checking file content)

•   Web site = host + Web server + information (file system)

•   application-level protocol
•   stateless (HTTP 1.0)
•   client (browser) make request - server responds
•   support for:
    –   use of URL’s
    –   Internet media types (MIME types: RFC2045-RFC2049)
•   allows access to different data formats
•   standards:
    –   HTTP 1.0 (RFC 1945), HTTP 1.1 (RFC 2068, 01.97.)


    protocol   server name    port   directory/file name on the server

•    HTTP is a simple protocol:
       1. Client finds out that it should use HTTP protocol
       2. Client opens TCP connection to the server info.nowhere.hr
          on the port 8000 (or if not specified on the default port 80)

    Client - server communication
•   Simple client request (entered manually)
    telnet www.srce.hr 80
    Connected to regoc.srce.hr.
    Escape character is '^]'.
    GET /index.html HTTP/1.0
    ACCEPT: */*
    USER-AGENT: manually entered HTTP
    (blank line)

    Client - server communication
•   Server reply:
    HTTP/1.0 200 OK
    Date: Tue, 29 Jul 1997 12:56:15 GMT
    Server: Apache/1.1.3
    Content-type: text/html
    Content-length: 2320
    Last-modified: Fri, 22 Nov 1996 10:07:27 GMT
    (blank line)
    (content - document source)

       Request methods
Method Client           Server

GET    request          send header & data

HEAD   request          send header

POST   request & data   receive data
                        (pass to CGI script)

PUT    request & data   receive data
                        (store as requested)

•   WWW Distributed Authoring and Versioning
•   an extension of HTTP 1.1.
•   provides infrastructure for asychronous collaborative
    authoring across the Internet
•   IETF approved - November 1998
•   supported by MS and Apache
•   webDAV home page
    –   http://www.ics.uci.edu/pub/ietf/webdav/

        client                    server
       File Open

       File Save

       File Close

             Server status codes
•   Status codes are three digit numbers grouped as
        1xx - informational
        2xx - client request successful
               200 - OK
        3xx - request redirected
        4xx - client errors (request incomplete)
               403 - Forbidden
               404 - Not found
        5xx - server errors

            WWW server software
•   traditionally freely available
•   for most of the platforms:
    –   UNIX, Ms Windows, Macintosh, VMS, VM, …
•   list of available servers software:
    –   http://www.yahoo.com/Computers_and_Internet/Software/Internet/World_Wide_Web/Servers/

•   Web Server Survey
    –   http://www.netcraft.com/Survey/
•   popular server programs:
    –   CERN, NCSA (first ones)
    –   Apache, MS IIS, Netscape servers, ...

•   A PAtCHy server is a kind of a plug-in replacement for
    NCSA httpd
•   under constant development
•   freely available:
     – in source code
     – binaries for many platforms (v. 1.3.x includes also the Windows NT)
     – supports HTPP 1.1. from 1.2.
•   useful addresses:
     – Apache home: http://www.apache.org/
     – http://www.apacheweek.com/
     – NCSA httpd documentation: http://hoohoo.ncsa.uiuc.edu/
     – support via Usenet: comp.infosystems.www.servers.unix

       Where to put the server?
•   server should run where information is been created
•   choose host carefully
•   give an DNS alias name to the selected host
    (www. mydoimain.mycuntry)
•   ServerRoot, DocumentRoot and Log files
    directories should be chosen carefully according to
    rules for all daemons and disk space requirements
•   User Home Pages?
•   CGI rules!

      Apache directory structure
•   can be designed (changed) during installation
    (compilation) process
•   some important directories:
    –   cgi-bin/ - CGI scripts directory (examples present)
    –   conf/ - configuration files for httpd server
    –   htdocs/ - main directory for documents
    –   logs/     - directory with log files (currently empty)
    –   other stuff (bin/, sbin/, src/, man/…)

      Apache configuration files
•   look in conf/ directory:
     – access.conf - access configuration
     – httpd.conf - server configuration
     – mime.types - MIME type to extensions definition
     – srm.conf - resource configuration
     – *.*-dist - distribution templates

    – since v.1.3.6. it is recommended to use only main
      configuration file httpd.conf

    Apache configuration directives
•   general rules:
     – case insensitive (not true for file/directory names)
     – comment lines begin with #
     – one directive per line
     – each line of these files consists of:
     – directive data [data2 ... datan]
     – extra whitespace is ignored

•   ServerType standalone
•   Port 80
•   User nobody
•   Group nogroup
•   ServerAdmin your_e-mail_address
•   ServerRoot /home/httpd/
•   ErrorLog /home/httpd/logs/error_log
•   TransferLog /home/httpd/logs/access_log
•   PidFile /home/httpd/logs/httpd.pid
•   more directives:
    –   Keep Alive, Spare Servers, Proxy, Cache, Virtual Servers, ...

              httpd.conf (srm.conf)
•   DocumentRoot /home/httpd/htdocs/
•   UserDir public_html
•   DirectoryIndex index.html
•   AccessFileName .htaccess
•   DefaultType text/plain
•   ScriptAlias /cgi-bin/ /home/httpd/cgi-bin/
•   more directives:
     –   Icons, Language, Handlers, ...

          httpd.conf (access.conf)
•   defines:
    –   which types of services are allowed
    –   in what circumstances

•   <Directory dir_name> … directives … </Directory>

•   be very careful due to possible problems:
    –   operational
    –   security

•   list of MIME types know to your server:
    –   format:    type/subtype      file_extension

•   files with other extension will be sent with DefaultType

•   add an entry according to your needs

    Starting and stopping Apache
•   if you selected standalone server type:
    –   simply execute the program (apachectl start)
    –   setup automated startup (during boot)
•   apachectl options: START, STOP, CONFIGTEST
•   Apache dynamically adapts to the workload
•   to stop (restart) the server use:
    –   kill command (UNIX) (pid is in httpd.pid file)
    –   apachectl stop

                    Access control
•   two levels:
    –   per-server (Global Access Configuration file) - using
        directives in httpd.conf (access.conf)
    –   per-directory (Per-directory Access Configuration file) -
        using .htaccess files (you can change this file name using
        AccessFileName directive in httpd.conf (srm.conf)
•   two ways:
    –   by user/password
    –   by host/domain

         httpd.conf (access.conf)
              DocumentRoot settings
<Directory /home/httpd/htdocs>
   –    instead of the Directory it is possible to use
        Location (controls URL’s) or Files (controls files).
   –    it is possible to use wild cards here: * ?
Options Indexes FollowSymLinks
   –   Option can be: FollowSymLinks, SymLinksIfOwnerMatch,
       ExecCGI, Includes, Indexes, IncludesNoExec, All, None
AllowOverride All
   –   Specify which Options can be overridden by per-directory
       access files

          httpd.conf (access.conf)
                    Scripts directory
<Directory /home/httpd/cgi-bin>
 Options FollowSymLinks
 AllowOverride None
•   the later directives (according to the order in the configuration
    files) are the more important (specific)
•   if permitted the more specific are the settings in the .htaccess

User/password authentication (1)
•   Create a file called .htaccess in required directory
    (of course you can do this on the server level)

       AuthUserFile /home/httpd/admin/.htpasswd
       AuthGroupFile /dev/null
       AuthName ByPassword
       AuthType Basic
       <Limit GET>
       require user username

    User/password authentication (2)
•   using htpasswd command create the password file:
     htpasswd -c /home/httpd/bin/.htpasswd username
•   enter password of your choice (later you can check the content of
    .htpasswd file)

•   multiple users (of course you have to create entries in .htpasswd file)
     – add users in require directive in .htaccess
     – create a group file (.htgroup), use directives AuthGroupFile and
       require group in .htaccess file
     – use require valid-user directives (all users from .htpasswd have

                  It works, but ...
•   server asks browser for user/password to allow access
•   password is send over the network not encrypted but
•   password is not visible in the clear, but can easily be
    decoded by anyone who happens to catch the right
    network packet (“sniffers in action”)
•   this method of authentication is as safe as telnet-style
    username and password security

     Host/domain authentication
•   protective: .htaccess file looks like

    <Limit GET>
     order deny,allow
     deny from all
     allow from hostname/domain

     Host/domain authentication
•   open: .htaccess file looks like

    <Limit GET>
     order allow,deny
     allow from all
     deny from hostname/domain

                    Access control
•   it is possible to use authentication by host/domain
    and by user/password together

•   for better security compile the Apache with the SSL
    (Secure Socket Layer)
    –   then server and client exchange the keys on the beginning
        of the session and all of the transactions are encrypted

Common Gateway Interface (CGI)
 •   WWW server is able to communicate with other
     programs (CGI scripts)
 •   CGI scripts can be written in any programming
     language (shell script, PERL, C, …)
 •   CGI scripts can use CGI environment variables
 •   CGI is used for:
         • getting input from user, forms processing, returning any kind of
            dynamic information, gateways to other services, ...
 •   workload is on the servers side (be careful)

•   server needs to be configured for CGI operation to
    enforce security procedures:
    ScriptAlias /cgi-bin/ /home/httpd/cgi-bin/
•   all of the files in /cgi-bin/ are considered to be a
    executable scripts (regardless of the name of the file)
•   security measures (with CGI scripts):
    –   parse and check user input
    –   programs should have only the power they require
    –   dynamically generated programs are not permitted
    –   carefully examine all cgi scripts (do not allow users to
        execute their own programs)

    Passing data (GET method)
•   data is simply attached to the end of the URL
    –   ? is used to separate data from URL (http://url?data)
    –   CGI programs are executed with URL address:

•   simple example: <ISINDEX> tag
    –   browser asks for input from user and attaches it to the URL
    –   the input is rewritten by browser (spaces become "+", \n
        become "&", …)
    –   server puts part of URL after "?" in to the environment
        variable QUERY_STRING

    Passing data (POST method)
•   recommended method for processing FORMS
•   on the HTML page with form you declare which script will be called
    to process data from the form:
     <FORM METHOD=”POST” ACTION=”/cgi-bin/script_name”>
•   when user hits the submit button client contacts server and passes
    request (POST /cgi-bin/script_name) with data from the form (data
    follows URL as a document)
•   to pass the data to the CGI program server uses environment
    variables and “stdin”

    Passing data (POST method)

•   server executes the CGI script and provides it with:
     – list of environment variables
     – input stream of FORM contents in name=value pairs
•   script knows how long this input stream is from environmental
    variable CONTENT_LENGHT
•   CGI script general procedure order:
     – read input from “stdin”
     – split name=value pairs and do value conversion (spaces, ...)
     – do something and print out results in HTML form to”stdout”

    Passing data (POST method)

•   CGI scripts are responsible for formatting output on
    "stdout" back to the server (finally server will pass
    this information to the client)
•   CGI script is responsible for generating content
    specific headers and send them as a first lines of
    output to the server
    –   for example:
         Content-type: text/plain
         FOLLOWED BY (at lest) ONE BLANK LINE !

        Server side includes (SSI)
•   server can be configured to scan documents with
    shtml extension for occurrence of construction like:
    <!--#command tag1="value1" tag2="value2" -->
    and replace them with the result of the command
•   this concept is used to add:
    –   current date, any other CGI environment variable value
    –   document's (or other file's) last modification data, size
    –   inline other document contents into the current document
    –   result of work of any other program on any Web server side

        API: modules & handlers
•   Apache breaks down clients request handling into a
    series of steps:
    –   URL --> Filename translation
    –   Auth ID checking
    –   Auth access checking
    –   Access checking other than above Auth
    –   Determining MIME type of the object requested
    –   `Fixups' - if needed
    –   Actually sending a response back to the client
    –   Logging the request

       API: modules & handlers
•   on any of those steps you may tide up an handler
    (the procedure)
•   a set of handlers may make an module, eg.: cgi
    module, log module, server side includes module,
    access module, ...
•   consistent specification of the steps allows to connect
    own modules to Apache which replace the old one or
    gives the new possibilities

                 Virtual servers
•   one server may listen on many hosts names - virtual
    servers (same port, different hostnames)
•   part of basic server configuration (httpd.conf)
    <VirtualHost hostname> … </VirtualHost>
•   each of the virtual server may have totally different
    content, configuration, separate log and error files, …

•   alternative is to run another server on a different port

                 Log analysis tools
•   servers logs access information in the file
    –   client host,
    –   date,
    –   client request,
    –   status,
    –   count of the bytes sent by server
•   it is possible (and easy) to produce many kinds of
    activity reports from that data
•   plenty of freeware log analyzers (wwwstat, analog,…)

•   “webwatch”:
    –   http://www.w3.org/
    –   http://www.ietf.org/
    –   http://www.hwg.org/
    –   http://www.irwa.org/
    –   http://www.apache.org/
•   books:
    –   Nancy J. Yeager, Robert E. McGrath: “Web Server
        Technology”, Morgan Kaufmann Publishers, Inc, 1996
    –   Stepher Spainhour, Valerie Quercia: “Webmaster in a
        Nutshell”, O’Rilley, 1996
    –   Ben Laurie, Peter Laurie: “Apache - The Definitive Guide”,
        O’Rilley, 1997

•   HTTP 1.1 & WebDAV
•   CSS & XSL
•   RDF - Resource Decsription Framework
•   XML - Extensible Markup Language
•   XHTML - Extensible HTML
•   PNG - Portable Network Graphics
•   Java & Jini
•   Dial tone  Web tone
•   …
               I think there may be a world market for perhaps 5 computers”
                                               (Thomas Watson, IBM, 1943)

                                                     “The internet is a fad”
                                               (Bill Gates, Microsoft, 1981)

Questions ?


Shared By: