Web Servers UNC Charlotte by pengxuebo


									Web Servers
  Generic Overview

Web Servers
   A web server can be:
       A computer program
            Responsible for accepting HTTP requests from
             clients (web browsers)
                 Returns HTTP responses with optional data
                 Usually web pages
                     HTML documents

                     Linked objects (images, etc.).

       A computer that runs a computer program
        which provides the above functionality
Common Features
Common Features
   HTTP
       Accepts HTTP requests from a client
       Provides HTTP responses to the client
          Typically an HTML document
          Can also be:
               Raw text file
               Image
               Some other type of document
                   defined by MIME-types

       If an error is found in the client request or
        while trying to serve the request
           Web server has to send an error response
               May include custom HTML
               May have text messages to better explain the
                problem to end users.
Common Features

   Logging
       Web servers keep detailed information
        to log files
          Client requests
          Server responses

       Allows the webmaster to collect data
            Running log analyzers
Additional Features

   Authentication
       Optional authorization before allowing
        access to some or all resources
            Requires a user name and password
   Handle
       Static content
       Dynamic content
            Support one or more related interfaces
                 SSI, CGI, SCGI, FastCGI, JSP, PHP, ASP, ASP
                  .NET, Server API such as NSAPI, ISAPI, etc.
Additional Features

   HTTPS support
       VIA SSL or TLS
       Allows secure (encrypted) connections
            Using port 443 instead of port 80
   Content compression
       I.e. by gzip encoding
       Reduces the size of the responses
            Lower bandwidth usage, etc.
Additional Features

   Virtual hosting
       Serve many web sites using one IP
   Large file support
       Serve files greater than 2 GB
            Typical 32 bit OS restriction
   Bandwidth throttling
       Limit the speed of responses
          Do not saturate the network
          Able to serve more clients
Origin of returned content

Where does it all come from?
Content Origin
   The origin of the content may be:
       Static
          Comes from an existing file pre-existing in a
           file system
       Dynamic
          Dynamically generated by some other
               Script
               Application Programming Interface (API) called by
                the web server
   Static content is usually delivered much
    faster than dynamic content
      2 to 100 times
       Especially if the latter involves data pulled
        from a database
Path translation

How does it find it?
Path translation
   Web servers map the path component
    of a Uniform Resource Locator (URL)
       Local file system resource
            Static requests
       Internal or external program name
            Dynamic requests
   For a static request the URL path
    specified by the client is relative to the
    Web server's root directory
Path translation
   Consider the following URL requested by a client:
       http://www.example.com/path/file.html
   Client's web browser translates it into a connection to
    www.example.com with the following HTTP 1.1 request:
       GET /path/file.html HTTP/1.1 Host: www.example.com
   The web server on www.example.com then appends the
    given path to the path of its root directory
       On Unix machines, this is commonly /var/www/htdocs.
       The result would then be the local file system resource:
            /var/www/htdocs/path/file.html
   Web server then reads the file, if it exists, and sends a
    response to the client's web browser
   Response will describe the content of the file and
    contain the file itself
   Web servers:
       Serve requests quickly
       From more than one TCP/IP connection at a time
   Main key performance parameters are:
       number of requests per second
            depends on the type of request, etc.
       latency response time in milliseconds for each new
        connection or request
       throughput in bytes per second
            depending on file size, cached or not cached content,
             available network bandwidth, etc.
   Measured under:
       Varying load of clients
       Varying requests per client
   Performance parameters may vary
    noticeably depending on the number
    of active connections
       A fourth parameter is the concurrency
        level supported by a web server under a
        specific configuration
   Specific server model used to
    implement a web server program can
    bias the performance and scalability
    level that can be reached under heavy
    load or when using high end hardware
       many CPUs, disks, etc.
Load limits
Load limits
   Web server (program) has defined load limits
       It can handle only a limited number of concurrent
        client connections per IP address (and IP port)
            Usually between 2 and 60,000
            Default between 500 and 1,000
   Can serve only a certain maximum number of
    requests per second depending on:
       its own settings
       the HTTP request type
       content origin (static or dynamic)
       whether the served content is or is not cached
       the hardware and software limits of the native OS
   When a web server is near to or over its limits
       It becomes overloaded and thus unresponsive
Overload causes
Overload causes

   A sample daily graph of a web
    server's load, indicating a spike in
    the load early in the day.
    Overload causes
   At any time web servers can be overloaded because of:
       Too much legitimate web traffic
            Thousands or even millions of clients hitting the web site in a short
             interval of time
       DDoS (Distributed Denial of Service) attacks
       Computer worms
            Abnormal traffic because of millions of infected computers (not
       XSS viruses
            Millions of infected browsers and/or web servers
       Internet web robots
            Traffic not filtered / limited on large web sites with very few resources
             (bandwidth, etc.)
       Internet (network) slowdowns
            Client requests are served more slowly and the number of connections
             increases so much that server limits are reached
       Web servers (computers) partial unavailability
            Required / urgent maintenance or upgrade
            HW or SW failures
            Back-end (i.e. DB) failures, etc.
            Remaining web servers get too much traffic and they become overloaded
Overload symptoms
Overload symptoms
   The symptoms of an overloaded web server are:
       Requests are served with (possibly long) delays
           from 1 second to a few hundred seconds

       500, 502, 503, 504 HTTP errors are returned to
            Sometimes also unrelated 404 error or even 408
             error may be returned
       TCP connections are refused or reset (interrupted)
        before any content is sent to clients
       In very rare cases, only partial contents are sent
            This behavior may well be considered a bug
                 Even if it stems from unavailable system resources
Anti-overload techniques
    Anti-overload techniques
   To partially overcome load limits and to prevent overload
    use techniques like:
       Managing network traffic by using:
          Firewalls
                 Block unwanted traffic from bad IP sources or having bad
            HTTP traffic managers
                 Drop, redirect or rewrite requests having bad HTTP patterns
            Bandwidth management and traffic shaping
                 Smooth down peaks in network usage
       Deploying web cache techniques
       Using different domain names to serve different content
        (static and dynamic) by separate Web servers, i.e.:
           http://images.example.com

           http://www.example.com
    Anti-overload techniques
   Techniques continued:
       Use different domain names and/or computers to
        separate big files from small/medium files
            Be able to fully cache small and medium sized files
            Efficiently serve big or huge (over 10 - 1000 MB) files
             by using different settings
       Using many Web servers (programs) per computer
            Each bound to its own network card and IP address
       Use many Web servers that are grouped together
            Act or are seen as one big Web server
            See Load balancer
Anti-overload techniques

   Techniques continued:
       Add more hardware resources
            RAM, disks, NICs, etc.
       Tune OS parameters
          Hardware capabilities
          Usage

       Use more efficient computer programs
        for web servers, etc.
       Use workarounds
            Specially if dynamic content is involved
Historical notes
    Historical notes

   The world's first web server.

   In 1989 Tim Berners-Lee proposed to his employer
    CERN (European Organization for Nuclear Research) a
    new project
       Goal of easing the exchange of information between
        scientists by using a hypertext system
   As a result of the implementation of this project, in
    1990 Berners-Lee wrote two programs:
       A browser called WorldWideWeb
       The world's first web server, which ran on NeXTSTEP
Historical notes
   The first web server in U.S.A. was installed
    on December 12, 1991
       Bebo White at SLAC
       After returning from a sabbatical at CERN
   Between 1991 and 1994 the simplicity and
    effectiveness of early technologies used to
    surf and exchange data through the World
    Wide Web helped to
       Port them to many different operating systems
       Spread their use among lots of different social
        groups of people
            First in scientific organizations
            Then in universities
            Finally in industry
    Historical notes
   In 1994 Tim Berners-Lee decided to
    constitute the World Wide Web
    Consortium (W3C)
       Regulate the further development of the
        many technologies in a standardization
          HTTP
          HTML

          etc.

   The following years saw an exponential
    growth of the number of web sites and
   July 2007, the most common HTTP serving programs:
       Apache HTTP Server
       Microsoft
            Microsoft is the sum of sites running
                  Microsoft-Internet-Information-Server
                  Microsoft-IIS, Microsoft-IIS-W
                  Microsoft-PWS-95
                  Microsoft-PWS
       Sun
            The sum of sites running:
                  SunONE,
                  iPlanet-Enterprise
                  Netscape-Enterprise
                  Netscape-FastTrack
                  Netscape-Commerce
                  Netscape-Communications
                  Netsite-Commerce
                  Netsite-Communications
       lighttpd
   There are thousands of different web
    server programs available
       Many specialized for very specific purposes
       The fact that a web server is not very
        popular does not necessarily mean
            Lot of bugs
            Poor performance
   See Category:Web server software for a
    longer list of HTTP server programs.
   The most popular web servers, used for
    public web sites, are tracked by
       Netcraft Web Server Survey
   Details given by
       Netcraft Web Server Reports
   According to this site:
       Apache has been the most popular web
        server on the Internet since April of 1996
       August 2007 Netcraft Web Server Survey:
            50.92% web sites on the Internet use Apache
            34.28% web sites use IIS
       With the active sites
            48.42% running Apache
            36.21% running IIS
Popular Web Servers

Who’s running the show?
What are they?
The big two:

We’re number one!
   Apache HTTP Server, referred to simply as Apache:
       A web server
       Notable for playing a key role in the initial growth of the
        World Wide Web
   Apache
       First viable alternative to Netscape Communications
        Corporation web server
            Currently known as Sun Java System Web Server
   Evolved to rival other Unix-based web servers
       Functionality and performance
   Since April 1996 Apache has been the most popular
    HTTP server on the World Wide Web
       Since March 2006 however it has experienced a steady
        decline of its market share
       Lost mostly against Microsoft IIS and the .NET platform
   September 2007: Apache served 50% of all websites
   Project's name was chosen for two reasons:
       Respect for the Native American Indian Apache tribe
            Well-known for their endurance and their skills in warfare
       Project's roots as a set of patches to the codebase of
        NCSA HTTPd 1.3
            Making it "a patchy" server
   Apache is developed and maintained by
       an open community of developers
       under the auspices of the Apache Software Foundation
   Available for a wide variety of OSs
      Microsoft Windows
      Novell NetWare
      Unix-like operating systems: e.g. Linux and Mac
       OS X
   Released under the Apache License
       Apache is free software / open source software.
Apache History
   The first version of the Apache web server was
    created by Robert McCool
       Heavily involved with the National Center for
        Supercomputing Applications web server
            Known simply as NCSA HTTPd
       When Rob left NCSA in mid-1994
            Development of httpd stalled
            Left a variety of patches for improvements circulating
             through e-mails
   Rob McCool was not alone in his efforts
       Several other developers helped form the original
        "Apache Group":
            Brian Behlendorf, Roy T. Fielding, Rob Hartill, David
             Robinson, Cliff Skol nick, Randy Terbush, Robert S.
             Thau, Andrew Wilson, Eric Hagberg, Frank Peters, and
             Nicolas Pioch
   Version 2 of the Apache server was a substantial re-
    write of much of the Apache 1.x code
       Strong focus on further modularization and the
        development of a portability layer, the Apache Portable
       Apache 2.x core: several major enhancements over
        Apache 1.x:
            UNIX threading
            Better support for non-Unix platforms
            New Apache API
            IPv6 support
       First alpha release of Apache March 2, 2000
       First general availability release on April 6, 2002
   Version 2.2 introduced a new authorization API that
    allows for more flexibility
       Also features improved cache modules and proxy

   Apache supports a variety of features
       Many implemented as compiled modules
        which extend the core functionality
       Range from server-side programming
        language support to authentication
          Common language interfaces support
                 mod_perl, mod_python, Tcl, and PHP
            Popular authentication modules include
                 mod_access, mod_auth, and mod_digest.
   Other features include:
       SSL and TLS support
            mod_ssl
       A proxy module
       A useful URL rewriter
          also known as a rewrite engine, implemented under
       Custom log files
            mod_log_config
       Filtering support
            mod_include
            mod_ext_filter
   Apache logs can be analyzed through a web browser
    using free scripts
       AWStats/W3Perl
       Visitors
   Virtual hosting allows one Apache installation to
    serve many different actual websites
       For example, one machine, with one Apache
        installation could simultaneously serve:
            www.example.com
            www.test.com
            test47.test-server.test.com
            etc.
   Apache features
       Configurable error messages
       DBMS-based authentication databases
       Content negotiation
   It is also supported by several graphical user
    interfaces (GUIs)
       Permit easier, more intuitive configuration of the server
   Apache is used to serve both static content
    and dynamic Web pages
       Many web applications are designed expecting
        the environment and features that Apache
   Apache is the web server component of the
    popular XAMPP web server application stack
       Partners with
            MySQL
            PHP/Perl/(Python) programming languages
   Apache is redistributed as part of various
    proprietary software packages including the
       Oracle Database
       IBM WebSphere application server
   Mac OS X integrates Apache
       Its built-in web server
       Support for its WebObjects application server
   It is also supported by Borland
       Kylix and Delphi development tools
   Apache is included with Novell NetWare 6.5
       Default web server
   Apache is used for many other tasks where
    content needs to be made available in a
    secure and reliable way
       Sharing files from a personal computer over the
       A user who has Apache installed on their desktop
        can put arbitrary files in the Apache's document
        root which can then be shared
   Programmers developing web applications
       Locally installed version of Apache
       Preview and test code as it is being developed

   Microsoft Internet Information
    Services (IIS) is the main competitor
    to Apache
       Trailed by
          Sun Java System Web Server
          Host of other applications
                such as Zeus Web Server
   The software license under which
    software from the Apache Foundation is
    distributed is a distinctive part of the
    Apache HTTP Server's history and presence
    in the open source software community
       The Apache License allows for the distribution of
        both open and closed source derivations of the
        source code
   The Free Software Foundation does not
    consider the Apache License to be
    compatible with version 2 of the GNU
    General Public License (GPL)
       Software licensed under the Apache License
        cannot be integrated with software that is
        distributed under the GPL
   It is a free software license
       Incompatible with the GPL
            Has a specific requirement that is not in the GPL
            Has certain patent termination cases that the
             GPL does not require
   However, version 3 of the GPL includes a
    provision (Section 7e) which allows it to be
    compatible with licenses that have patent
    retaliation clauses, including the Apache
   The name Apache is a registered trademark
    and may only be used with the trademark
    holder's express permission
Microsoft IIS

We’re # 2…
   Microsoft Internet Information Services (IIS)
       Formerly called Internet Information Server
       Set of Internet-based services for servers using
        Microsoft Windows
       World's second most popular web server in terms of
        overall websites
            September 2007: it served 34.94% of all websites
             and 36.63% of all active websites (Netcraft)
       Services currently include:
            FTP
            SMTP
            NNTP
            HTTP/HTTPS
History of IIS

   IIS initially released as additional set
    of Internet based services for Windows
    NT 3.51
       IIS 2.0 added support for the Windows NT
       IIS 3.0 introduced the Active Server Pages
        dynamic scripting environment
       IIS 4.0 dropped support for the Gopher
            Bundled with Windows NT as a separate "Option
             Pack" CD-ROM
History of IIS

   Current shipping versions of IIS:
       7.0 for Windows Vista
       6.0 for Windows Server 2003
            Added support for IPv6
       5.1 for Windows XP Professional
            IIS 5.1 for Windows XP is a restricted version
             of IIS that supports only 10 simultaneous
             connections and a single web site
History of IIS

   FastCGI module available for IIS5.1,
    IIS6 and IIS7
   Windows Vista does not install IIS
    7.0 by default
       Can be selected among the list of optionally
        installed components
       IIS 7.0 on Vista does not limit the number of
        connections allowed
           Restricts performance based on active
            concurrent requests
   Earlier versions of IIS had lot of vulnerabilities
       Chief among them CA-2001-19
            Led to the infamous Code Red worm
   Version 7.0 currently has no reported issues
   In perspective, as of 11 September 2007, the
    free software Apache web server has one
    unpatched reported issue
       Affecting only MS Windows systems
       Rated "less critical“

   IIS 6.0 opted to change the behavior
    of pre-installed ISAPI handlers
       Many of which were culprits in the
        vulnerabilities on 4.0 and 5.0
       Reduced the attack surface of IIS
       IIS 6.0 added a feature called "Web
        Service Extensions“
            Prevents IIS from launching any program
             without explicit permission by an

   IIS 7.0 the components were
       Only the required components have to
        be installed
       Further reducing the attack surface
       Security features such asURLFiltering
        were added
            Rejects suspicious URLs based on a user
             defined rule set

   In IIS 5.1 and lower:
       By default all websites were run
          In-process
          Under the System account
                a default Windows account with elevated rights
   In IIS 6.0 all request handling processes
    have been brought under a Network
    Services account
       Has significantly fewer privileges
       If there is an exploit in a feature or custom code
            Wouldn't necessarily compromise the entire
            Given the sandboxed environment the processes
       Contains a new kernel HTTP stack (http.sys)
            Stricter HTTP request parser and response cache
             for both static and dynamic content
Authentication mechanisms
Authentication mechanisms

   IIS 5.0 and higher support the
    following authentication
       Basic access authentication
       Digest access authentication
       Integrated Windows Authentication
       .NET Passport Authentication
Internet Information Services
Internet Information Services 7.0
   Debuting with Windows Vista
       To be included in Windows Server 2008
   IIS 7.0 features a modular architecture
       Instead of a monolithic server which features all
       IIS 7 has a core web server engine
       Modules offering specific functionality can be
        added to the engine to enable its features
       Advantages
           Only the features required need be enabled

           The functionalities can be extended by using
            custom modules
Internet Information Services 7.0
   IIS 7 will ship with a handful of modules
       Microsoft will make other modules available
       The following sets of modules are slated to ship
        with the server:
          HTTP Modules

          Security Modules

          Content Modules

          Compression Modules

          Caching Modules

          Logging and Diagnostics Modules that
           integrates with the new configuration store, as
           well as the new management environment
Internet Information Services 7.0
   Writing extensions to IIS 7 using ISAPI has been
    deprecated in favor of the module API, using which
    modules can plug in anywhere in the request processing
       Much of IIS's own functionality is built on this API
            Developers will have much more control over a request
             process than was possible in prior versions
       Modules can be written using C++ or using the ihttpmodule
        class of the .NET Framework language
       Modules can be loaded globally where the services provided
        by the module can effect all sites, or loaded on a per-site
       IIS 7 has an integrated mode application pool where .NET
        modules are loaded into the pipeline using the module API,
        rather than ISAPI.
            As a result ASP.NET code can be used with all requests to the
       For applications requiring strict IIS 6.0 compatibility, the
        Classic application pool mode loads asp.NET as an ISAPI.
Internet Information Services 7.0
   A significant change from previous
       All web server configuration information is stored
        solely in XML configuration files
       Instead of in the metabase
   The server has a global configuration file
       Provides defaults
       Each virtual web's document root (and any
        subdirectory thereof) may contain a web.config
            Containing settings that augment or override the
Internet Information Services 7.0
   Changes to these files take effect
       Marks a significant departure from previous
        versions whereby web interfaces, or machine
        administrator access, were required to change
        simple settings such as default document, active
        modules and security/authentication
   It also eliminates the need to perform
    metabase synchronization between multiple
    servers in a farm of web servers
Internet Information Services 7.0

   Features a completely rewritten
    administration interface
       Takes advantage of modern MMC
        features such as
          Task panes
          Asynchronous operation

       Configuration of ASP.NET is more fully
        integrated into the administrative
Internet Information Services 7.0
   Other changes:
       PICS content ratings, support for Microsoft Passport,
        and server-side image maps are no longer included
       Executing commands via server-side includes is no
        longer permitted.
       IISRESET -reboot has been removed
       The CONVLOG tool, which converts IIS log files into
        NCSA format, has been removed
       Support for enabling a folder for "Web Sharing" via the
        Windows Explorer interface has been removed.
       IIS Media Pack, which allows IIS to be used as a bare-
        bones media server, without using Windows Media
       New FTP module, that integrates with the new
        configuration store, as well as the new management

   Concentrated on HTTP servers
   Apache and IIS are the main web
    serving tools
   Apache still king
       IIS Up and coming
   Usage tracked
       Netcraft Web Server Survey

To top