Apache Web Server

Document Sample
Apache Web Server Powered By Docstoc
					Advanced Unix

          Apache Web Server
          November 29, 2005
Web Servers
   Tim Berners-Lee is credited with having
    created the World Wide Web
    – he was a researcher at the European High-
      Energy Particle Physics lab, the Conseil
      Européenne pour la Recherche Nucleaire
      (CERN), in Geneva, Switzerland.
    – A tool was needed to enable collaboration
      between physicists and other researchers
Web Servers
   Tim Berners-Lee wrote a proposal called
    HyperText and CERN in 1989
    – an extension of the gopher concept but
      incorporated many new ideas and features,
       • HTML (HyperText Markup Language)
       • HTTP (HyperText Transfer Protocol)
       • Web browser client software program
    – 1989 it was first installed at CERN
    – 1991 it was fully operational
Web Servers
   Main type of web servers exist
    – For Linux the primary server is Apache
   Fedora Core 3 comes with:
    –   Apache
    –   Tux
    –   Stronghold
    –   Zope
    –   BOA
    –   Jigsaw, etc…..
Apache Overview
   The “A Patchy” Web server
    – put together over time by the Apache group
    – Based on the National Center for
      Supercomputing Applications (NCSA) Web
      • The NCSA was created by the National Science
        Foundation (NSF) and the state of Illinois in 1986
        at the University of Illinois
   Apache is free, open-source
    Apache Overview
 Configured with Text files
 Dependable
 Available for numerous platforms,
    – even Windows
   Netcraft.Com shows 76,000,000 web sites
    – 70% are Apache
    – 21% are Microsoft
Apache Overview
   There are two core versions of Apache
    – Version 1.3.x
       • Fast enough for most sites
       • Particularly on 1 and 2 CPU systems
    – Version 2.0.x
       •   More features
       •   filters
       •   threads
       •   portability
       •   Scales to much higher loads
Apache Configuration Tool
   Most installations of Linux now have
    a gui to configure Apache
    – system-config-httpd
Setting Up Apache
 First Screen will
 Server Name
 Webmasters e-mail
 Set up for http or
Setting Up Apache
   Virtual hosts
    – Intranets
    Setting Up Apache
   Virtual Hosts
   Edit
   Directory of the server
   Virtual Host name
   E-mail of Web Master
Setting Up the Virtual Host
 Site Configuration
 What is linked to the domain name or
  IP address.
Setting Up Apache
 Lock On Browser
Setting Up Apache
 Logs
 Why Send a Log to a program?
Setting Up Apache
 CGI Scripts
 Perl Scripts
 Useful
Setting Up Apache
   Directories
    – For Multiple
Testing Apache
 Now if Apache is running
 Create two files
    – index.htm
    – phptest.php
   Save files in:
    – /var/www/html/
    – Document Root Directory
   Looks like this:
   File looks like this:
Testing Apache
 Open the web browser on the system
  that apache is configured.
 In the Address bar type in the IP
  Address of the system.
Testing Apache
 Now test Apache from another
  machine on the network.
 Open a web browser then type IP
  Address in the address bar.
 PHP is a script language for web
 Comes from Perl
 Great for databases and Content
  Management Systems (CMS)
   http://<your-ip>/testphp.php
   Looks like this:
Apache Configuration
Prefork MPM

 Apache 1.3 and Apache 2.0 Prefork
 Each child handles one connection at a
 Many children
 High memory requirements
 “You‟ll run out of memory before CPU”
Prefork Directives (Apache 2.0)

 StartServers
 MinSpareServers
 MaxSpareServers
 MaxClients
 MaxRequestsPerChild
Worker MPM

 Apache 2.0 and later
 Multithreaded within each child
 Dramatically reduced memory
 Only a few children (fewer than prefork)
Worker Directives

 MinSpareThreads
 MaxSpareThreads
 ThreadsPerChild
 MaxClients
 MaxRequestsPerChild
KeepAlive Requests

 Persistent connections
 Multiple requests over one TCP socket

   Directives:
    – KeepAlive
    – MaxKeepAliveRequests
    – KeepAliveTimeout
Apache 1.3 and 2.0
Performance Characteristics

                 or Both?
   High memory usage
   Highly tolerant of faulty modules
   Highly tolerant of crashing children
   Fast
   Well-suited for 1 and 2-CPU systems
   Tried-and-tested model from Apache 1.3
   “You‟ll run out of memory before CPU.”
   Low to moderate memory usage
   Moderately tolerant to faulty modules
   Faulty threads can affect all threads in
   Highly-scalable
   Well-suited for multiple processors
   Requires a mature threading library
    (Solaris, AIX, Linux 2.6 and others work well)
   Memory is no longer the bottleneck.
Important Performance
 sendfile() support
 DNS considerations
 stat() calls
 Unnecessary modules
sendfile() Support
 No more double-copy
 Zero-copy*
 Dramatic improvement for static files
 Available on
    –   Linux 2.4.x
    –   Solaris 8+
    –   FreeBSD/NetBSD/OpenBSD
    –   ...

* Zero-copy requires both OS support and NIC driver support.
DNS Considerations
   HostNameLookups
    – DNS query for each incoming request
    – Use logresolve instead.

   Name-based Allow/Deny clauses
    – Two DNS queries per request for each
      allow/deny clause.
stat() for Symlinks
   Options
    – FollowSymLinks
      • Symlinks are trusted.
    – SymLinksIfOwnersMatch
      • Must stat() and lstat() each symlink, yuck!
stat() for .htaccess files
   AllowOverride
    – stat() for .htaccess in each path
      component of a request
    – Happens for any AllowOverride
    – Try to disable or limit to specific sub-
    – Avoid use at the DocumentRoot
stat() for Content Negotiation
   DirectoryIndex
    – Don‟t use wildcards like “index”
    – Use something like this instead
      DirectoryIndex index.html index.php

   mod_negotiation
    – Use a type-map instead of MultiViews if
Remove Unused Modules
   Saves Memory
    – Reduces code and data footprint
 Reduces some processing (eg.
 Makes calls to fork() faster

   Static modules are faster than

             Common pitfalls
            and their solutions
Check your error_log
 The first place to look
 Increase the LogLevel if needed
    – Make sure to turn it back down (but not
      off) in production
Check System Health
 vmstat, systat, iostat, mpstat,
  lockstat, etc...
 Check interrupt load
    – NIC might be overloaded
   Are you swapping memory?
    – A web server should never swap
   Check system logs
    – /var/log/message, /var/log/syslog, etc...
Check Apache Health
   server-status
    – ExtendedStatus   (see next slide)

 Verify “httpd -V”
 ps -elf | grep httpd | wc -l
    – How many httpd processes are
server-status Example
Other Possibilities
 Set up a staging environment
 Set up duplicate hardware

   Check for known bugs
    – http://nagoya.apache.org/bugzilla/
Common Bottlenecks
 No more File Descriptors
 Sockets stuck in TIME_WAIT
 High Memory Use (swapping)
 CPU Overload
 Interrupt (IRQ) Overload
File Descriptors
   Symptoms
    – entry in error_log
    – new httpd children fail to start
    – fork() failing across the system

   Solutions
    – Increase system-wide limits
    – Increase ulimit settings in apachectl
   Symptoms
    – Unable to accept new connections
    – CPU under-utilized, httpd processes sit idle
    – Not Swapping
    – netstat shows huge numbers of sockets in TIME_WAIT

   Many TIME_WAIT are to be expected
   Only when new connections are failing is it a
    – Decrease system-wide TCP/IP FIN timeout
Memory Overload, Swapping
   Symptoms
    –   Ignore system free memory, it is misleading!
    –   Lots of Disk Activity
    –   top/free show high swap usage
    –   Load gradually increasing
    –   ps shows processes blocking on Disk I/O

   Solutions
    – Add more memory
    – Use less dynamic content, cache as much as possible
    – Try the Worker MPM
How much free memory
do I really have?
 Output from top/free is misleading.
 Kernels use buffers
 File I/O uses cache
 Programs share memory
    – Explicit shared memory
    – Copy-On-Write after fork()
   The only time you can be sure is
    when it starts swapping.
CPU Overload
   Symptoms
    –   top shows little or no idle CPU time
    –   System is not Swapping
    –   High system load
    –   System feels sluggish
    –   Much of the CPU time is spent in userspace

   Solutions
    – Add another CPU, get a faster machine
    – Use less dynamic content, cache as much as
Interrupt (IRQ) Overload
   Symptoms
    –   Frequent on big machines (8-CPUs and above)
    –   Not Swapping
    –   One or two CPUs are busy, the rest are idle
    –   Low overall system load

   Solutions
    – Add another NIC
         • bind it to the first or use two IP addresses in Apache
         • put NICs on different PCI busses if possible
Virtual Hosts
Virtual Hosting
 Apache was among the first (the first?)
  web server to offer Virtual hosting.
 With Virtual hosting many URL's can be
  associated with one IP address
    – this is useful as IP addresses are a limited
   IIS as supplied free with W2K/XP does not
    support Virtual Hosting.
Many hosts, one IP

   Several Hosts may translate to the same
    IP address.
    – IP addresses are a scarce reource.
   An Apache server listening on will read the Host: field to
    see where to look for the page to serve.
Host field
   http://www.ollieclark.com/acronyms
   The HTTP request:
       GET /acronyms.html HTTP/1.1.
       Host: www.ollieclark.com
   Apache users the Host header to see
    which domain was requested
    – this is only available in HTTP/1.1

   Apache checks its virtual hosts for the
    requested Host to see which page to serve
An Example
 We want to give convenient access to
  some administrative functions at
  www.myfirm.co.uk site
 We want the URL
  to run a script for administering the site.
 We add a virtual domain
    – this is OK as registered .co.uk domain will be
    – In fact 'www' indicates a subdomain
Adding Virtual Hosts
                   NameVirtualHost
                    directive specifies an
                    interface on which
                    Apache will accept
                    virtual host requests.
                    – „*‟ means all interfaces.
                    – can be several
                    – Virtual hosts on the
                      loopback interface
   Why set up virtual hosts on your local computer?
   Use the Hosts file
     – also on Linux
   Add entries:

   Then http://admin.myfirm.co.uk/ … will go the local
    Apache instance which will process the Vhosts as it
    would in a real set up. Useful for constructing a website
Security – small rant
   "Security" has three aspects:
    A. Security. Data is not lost.
    B. Availability. Data is available to its owners
    C. Privacy. Data is not available to others
 It is trivial to achieve C on its own.
 The challenge is to achieve acceptable
  levels of A and C while allowing sufficient
  of B.
 Advice to keep an Apache web server
  secure is often just "Don't allow …".
Access (external)
   Security as regards visitors to websites
    hosted by Apache on the web-server.
    – External security is managed by .htaccess files
    – and in the main configuration files
   An .htaccess file is placed in a directory
    and manages access to that directory.
   An .htaccess file may be placed in any directory
   It controls many features of how Apache treats that
     – security
     – execute scripts
     – use server-side includes
   .htaccess files only work if main configuration file has
    permitted them by an apprpriate AllowOverride directive:

 Name of the
 password file

Simplest type
of password
                 User must give
                 password         Cannot GET or
 Displayed to
                                  POST without

   To protect a directory /htdocs/secure
    we place an .htaccess file in it
   This is a text file as above.
Order Allow Deny
 Three directives really: order, allow, deny
 Allow directives specify who can access a
 Deny directives specify who cannot
  access a resource
 Order directive specifies the order in
  which the Allow and Deny directives are
    Order directive
   Order directive takes a single argument which
    is one of:
     – Deny,Allow
     – Allow,Deny
 Deny,Allow evaluates the Deny directives first
  and then the Allow directives. So the Allow
  directives can override the Deny ones. Any
  request which does not match any directive is
  allowed. So default is Allow access)
 Allow,Deny reverses the ordering. Default is to
  Deny access.
Allow/Deny directives
 "Allow from" location
 Recall that the address of the client is supplied
 location can be a domain name or partial
  domain name, an IP address or partial IP
    – Allow from comp.leeds.ac.uk would allow
      connections originating from within the School
    – Allow from 129.11 would allow connections fro any
      IP address whose first two bytes were 129.11
 Allow from all is legitimate
 the Deny directive has the same syntax,

   The example below is one way to allow
    access to clients in the School of

     Order Deny,Allow
     Deny from all
     Allow from comp.leeds.ac.uk
Access (internal)
 The security situation as regards other
  users of the web-server.
 A web-server has three relevant classes of
    – the administrators (root, wheel)
    – users (<username>, users)
    – Apache (nobody, users)
 Users manage websites.
 Apache needs access to the users
  directories to retrieve web-pages and
  execute cgi-scripts
A Typical Task
 We have a script that is going to create
  and modify the contents of a file.
 Visitors to the site will make these
 We investigate
    – file/directory permissions needed to make this
    – how 'insecure' this leaves the files.
   Steps
    – review file and directory permissions
    – look at application
   File permissions are 'r', 'w', 'x' set
    separately for owner, group, and other.
    – processes run with user and group identity

   owner uses owner permissions only
    group uses group permission only
    other uses other permission only
 Octal digits 0-7
 chmod command chmod access
 Access can be three octal digits:
    – 1st for owner, 2nd for group 3rd for
    – 4 enables read, 2 enables write, 1 enables
   So 705 enables rwx for owner, no access
    group, rx for other, 777 enables everyone
    rwx, 700 enable rwx for owner but nothing
    for the group.
   To access a file referenced by a path you
    must have 'x' permission on every directory
    on the path.
    – if 'x' is missing then you cannot list a directory
   To read temp.txt requires 'r' a file
Create and delete
 To create files in a directory a process
  must have 'w' and 'x' permission on that
 If you can create a file you can delete any
  file in the directory
    – unless the 'sticky bit' is set, then a process
      can only delete the files it owns (except the
      owner of the directory)

 Web page visitors.html invites the
  user to add a comment.
 The work is done by visitors.py
  which opens the file visitors.txt, adds
  the comment and returns the current
 See visitors.html, visitors.py
Sample permissions

   Set visitors.py permissions to 755
   Set visitors.html to 644
   Set visitors.txt to 666
   Ser directory of visitors.txt 777
   You see these permissions frequently
    – they will work whatever user and group
      Apache is running as.
    – typically Apache runs as user nobody (group
   The script opens visitors.txt for
    – if the file does not exist it is created
    – Creation requires write permissions on the
   Creation permission on the directory
    carries with it delete permission
    – so the script could delete the file if it wanted
    – in fact any Apache script on that server can
      delete the file, not just your scripts.
 The malicious user needs to know the file
  system path to the writable directory.
 You only need set other permissions for
  the standard Apache set up. Thus 707, 606,
  404 will do.
    – you can set directory permissions to 705 on your
      home directory. Then other users cannot list your
      directories because they share your group (users,
   Some server set ups allow Apache to run as
    the user who owns the file requested