Installing Nagios on CentOS 4

Document Sample
Installing Nagios on CentOS 4 Powered By Docstoc
					Installing Nagios on CentOS 4.x/5.x
  1. Installing Nagios on CentOS 4.x/5.x
     1.      System:
     2.      References:
     3.      Packages:
     4.      General Upgrades
            1.      Upgrading from 2.4
            2.      Upgrading from 2.5
      5.    Set up Apache
      6.    Installation/Configuration
         1.         Configure the Nagios Apache file
            2.     Set up the password file
            3.     Set up the CGI file
            4.     Setting up nagios.cfg
      7.    Object configuration files
         1.        Timeperiods
            2.          Contacts/Contacts groups
            3.          Host and host groups
            4.          Services
      8.         Starting Nagios
      9.         Escalations
      10.        Extended information
      11.        Dependencies
      12.        SELinux
      13.        That's all, folks!

  This document will breeze through installing and configuring everything necessary to get Nagios up and
  running. This will not touch in detail on the actual configuration directives Nagios uses. For that,
  documentation is readily available from the Nagios website, or available locally after Nagios is installed. I'll
  be explaining installation through RPMs and yum from Dag's repo (RPMforge), but source is available if
  you prefer to build your own. Again, documentation for this is readily available. Please see the third-
  party Repositoriessection of the CentOS wiki in you don't already know how to enable repos. This also
  assumes you already have a working e-mail server in your existing network as well. That's how
  notifications will get sent, and that's beyond the scope of this.



System:
  o   CentOS 4.x/5x (Should work for any RHEL/Fedora flavor.)
  o   Nagios: 2.9
References:

  o   Nagios: http://www.nagios.org/

  o   Official Docs: http://nagios.org/docs/

  o   Community Docs: Repositories

  o   Monitoring Exchange: http://www.monitoringexchange.org/

  o   Web Interfaces: http://www.nagios.org/faqs/viewfaq.php?faq_id=183

  o   Visualization additions: http://www.nagvis.org/



Packages:
  o   nagios-2.9-1.el4.rf
  o   nagios-devel-2.9-1.el4.rf
  o   nagios-plugins-nrpe-2.5.2-1.el4.rf
  o   nagios-plugins-1.4.8-2.el4.rf


  Other:

  o   Apache 2.0



General Upgrades

  A quick note about upgrading. Generally, upgrading is as simple as typing yum update package_name.
  Just to be on the cautious side, backup your configuration files in /etc before upgrading. Secondly, always
  read the release notes to make sure configuration files and directives haven't change.


Upgrading from 2.4

  A quick note about upgrading. If you're upgrading from version 2.4 (and previous 2.x version), and you've
  installed following this guide then a simple yum upgrade will work just fine. As always , it's best to backup
  any previous configurations before upgrading just in case something goes awry. Also, from release 2.4 to
  2.5 the only packages that Dag has re-spun are nagios and nagios-devel.


  [me@mymachine ~] yum update nagios nagios-plugins nagios-devel nagios-
  plugins-nrpe


Upgrading from 2.5

  If you're upgrading from version 2.5 to 2.6, Dag's RPMs had a few quirks. Make sure you backup
  /etc/nagios before continuing.


  [me@mymachine ~] service nagios stop
  [me@mymachine ~] cp -ar /etc/nagios /wherever/nagios_2.5_backup
  [me@mymachine ~] yum update nagios nagios-plugins nagios-devel nagios-
  plugins-nrpe



  If cgi.cfg, misccommands.cfg, or checkcommands.cfg are missing or saved as .rpmsave or .rpmnew, then
  just copy them back from your backup that you just created. Otherwise, ignore the error just mentioned
  because the RPMs have been repaired. Thanks Dag! Also, there is a mistake int he check_linux_raid.pl
  script in the contributed plugins. This is easily fixed. Again, if you don't have any problems running this
  plugin, then it was fixed as well. This was brought to my attention from the Nagios mailing list. A user had
  spooted this and reported things to the packager, so this is probably resolved by now. Anyways, to correct
  the check_linux_raid.pl:


  [me@mymachine ~] vim /usr/lib/nagios/plugins/contrib/check_linux_raid.pl
  Comment out line 26:
  use strict;
  #use lib utils.pm
  use utils qw(%ERRORS);


  [me@mymachine ~] nagios -v /etc/nagios/nagios.cfg
  [me@mymachine ~] service nagios start



Set up Apache

  Make sure you have Apache installed, then you'll need to quickly configure it if not. Chances are you
  probably already have some web service running on your machine, but if not, get it running quickly this
  way.


  [me@mymachine ~] yum install httpd
  [me@mymachine ~] vim /etc/httpd/conf/httpd.conf



  At least edit the server name directive to your IP address within /etc/httpd/conf/httpd.conf. Then turn on
  Apache, and make sure it's set to start.


  [me@mymachine ~] chkconfig httpd on
  [me@mymachine ~] service httpd start



  Now open up a browser and see if your web server is running: http://localhost (or your IP). You should see
  the Apache 2 test page. If so, move along.
   If you require further assistance with getting Apache going, especially if you have a need to secure the
   server, then please follow the documentation at http://www.apache.org. This will get your web server up
   and running quickly, but provides no means of security what-so-ever, I just want to warn you. If you're
   running completely internal, then it shouldn't be a big deal. Ok, after you get that running, let's install
   Nagios and start working on setting it up. By default, the RPMs you are going to install automatically create
   a nagios.conf file for Apache to use. This file is in /etc/httpd/conf.d/nagios.conf.



Installation/Configuration

   Nagios requires several different packages be installed so that it may perform the magic it does so well.
   The core is the Nagios package itself. Without the plugins package, though, Nagios won't be able to
   actually process any checks on your system. The development package obviously contains all the libraries,
   headers, and document files for developing Nagios. The other optional packages are the NRPE package,
   and the NSCA (Nagios Service Check Acceptor) which I don't use. You may have use for it, so check out
   the main site for details. Also, Nagios must run under both the user and group "nagios." The RPM install
   takes care of this step for you, so there's no need to create the us er and group.


   [me@mymachine ~] yum install nagios nagios-plugins nagios-plugins-nrpe
   nagios-devel



   It'll go ahead and pull down a few other packages for dependencies as well. That's it for installation. Let's
   move back over to Apache's side for a bit.


Configure the Nagios Apache file

   Unless you want other options such as SSL configurations or allowing access to the CGI from only certain
   hosts, then the defaultnagios.conf file will suit your needs. Here's what it looks like:


   ScriptAlias /nagios/cgi-bin "/usr/lib/nagios/cgi"
   <Directory "/usr/lib/nagios/cgi">
   #           SSLRequireSSL
               Options ExecCGI
               AllowOverride None
               Order allow,deny
               Allow from all
   #           Order deny,allow
   #           Deny from all
   #           Allow from 127.0.0.1
               AuthName "Nagios Access"
               AuthType Basic
               AuthUserFile /etc/nagios/htpasswd.users
              Require valid-user
   Alias /nagios "/usr/share/nagios"
   <Directory "/usr/share/nagios">
   #          SSLRequireSSL
              Options None
              AllowOverride None
              Order allow,deny
              Allow from all
   #          Order deny,allow
   #          Deny from all
   #          Allow from 127.0.0.1
              AuthName "Nagios Access"
              AuthType Basic
              AuthUserFile /etc/nagios/htpasswd.users
              Require valid-user
   </Directory>



   Unless you want other options configured, that's it for now. Let's set up authentication now.


Set up the password file

   If you don't want to use the name "nagiosadmin" simply substitute your name. Remember later on you'll
   need to use the same name in some CGI configuration settings.


   [me@mymachine ~] htpasswd -c /etc/nagios/htpasswd.users nagiosadmin
   New password: type_your_password
   Re-type new password: re-type_your_password
   Adding password for user nagiosadmin



   It's also up to you if you'd like to create a "guest" account. The guest account would allow viewers to see
   various things you specify within Nagios, but it won't give them total access to the CGI interface. For
   example, viewers could see host status information, but can't schedule downtime for hosts...things like this.
   If you want a guest account, add the account.


   [me@mymachine ~] htpasswd /etc/nagios/htpasswd.users guest
   New password: type_your_password
   Re-type new password: re-type_your_password
   Adding password for user guest



   NOTE: Notice I took away the "-c" option. This is the create option. Since you already created the file,
   make sure any other accounts you add are not with the create option. You'll wipe the file out if you do.
Set up the CGI file

   The next step is to set up the users you just created in the main CGI configuration file. I'm going to assume
   that you are not using a guest account, and that you have only created one admin "nagiosadmin" account.
   Also, ensure you have it set up to use authentication. 1 means on, 0 means off.


   [me@mymachine ~] cd /etc/nagios
   [me@mymachine nagios] vim cgi.cfg


   # AUTHENTICATION USAGE
   use_authentication=1


   # SYSTEM/PROCESS INFORMATION ACCESS
   authorized_for_system_information=nagiosadmin


   # CONFIGURATION INFORMATION ACCESS
   authorized_for_configuration_information=nagiosadmin


   # SYSTEM/PROCESS COMMAND ACCESS
   authorized_for_system_commands=nagiosadmin


   # GLOBAL HOST/SERVICE VIEW ACCESS
   authorized_for_all_services=nagiosadmin
   authorized_for_all_hosts=nagiosadmin


   # GLOBAL HOST/SERVICE COMMAND ACCESS
   authorized_for_all_service_commands=nagiosadmin
   authorized_for_all_host_commands=nagiosadmin



   Save this file when you are finished editing it. There are a lot of other optional parameters to change or
   play with, so have fun customizing the web interface to your liking. Let's test out what you've done so far.
   Restart Apache and browse to http://localhost/nagios/. You should see your pretty little web interface to
   Nagios now, after you supply the credentials that you just created. You can browse through the links to the
   left, but the majority of them won't work because nothing is configured yet.


Setting up nagios.cfg

   Once you start checking around in /etc/nagios, you'll see there are few example configuration files to tak e a
   peek at. One being "localhost.cfg." This file uses an all in one approach to configuring the object files later
   on. I find this confusing, especially if you eventually have a very large network to monitor. Instead, you'll
   split out the configurations into separate files, which will keep you sane later on. Go ahead and move this
file. Previously, the sample files were named "bigger.cfg" and "minimal.cfg" but with Nagios 2.9 it's now just
the one file.


[me@mymachine ~] cd /etc/nagios
[me@mymachine nagios] mv localhost.cfg localhost.cfg_org



Now we're going to open up the main Nagios configuration file. This file is basically self-explanatory with
the comments inside of it. The short version is as follows. Nagios allows you to specify every configuration
from one single file, "localhost.cfg," if so desired. When you have only a few hosts and services to monitor
this idea is rational, but when you have tons of items to monitor this is a bad idea. It's going to take you a
long time to get used to setting up Nagios to begin with, so do yourself a favor and split out all your files
into the categories as mentioned below. Meaning, use a separate file for hosts and hostgroups, a separate
file for services and servicegroups, and a separate file for everything else you decide to configure. You'll
thank me later. Let's start with the basics needed. The external command options I turn on in order to allow
commands to be executed from the CGI web interface.


[me@mymachine nagios] vim nagios.cfg


# OBJECT CONFIGURATION FILE(S)
cfg_file=/etc/nagios/contactgroups.cfg
cfg_file=/etc/nagios/contacts.cfg
cfg_file=/etc/nagios/hostgroups.cfg
cfg_file=/etc/nagios/hosts.cfg
cfg_file=/etc/nagios/services.cfg
cfg_file=/etc/nagios/timeperiods.cfg


# EXTERNAL COMMAND OPTION
check_external_commands=1


# EXTERNAL COMMAND CHECK INTERVAL
command_check_interval=1



Go ahead and save the file. Now, for each file you you specified above you'll need to create the file
because it doesn't exist within /etc/nagios.


[me@mymachine nagios] touch contactgroups.cfg contacts.cfg hostgroups.cfg
hosts.cfg services.cfg timeperiods.cfg
[me@mymachine nagios] chown nagios.nagios contactgroups.cfg contacts.cfg
hostgroups.cfg hosts.cfg services.cfg timeperiods.cfg
  One last note about this section. If you are planning on using the external commands on the CGI interface
  (check_external_commands), you might run into a few permissions issues. Please check out the Nagios
  FAQ interface if you get any errors when you try to run a command on the web interface. The FAQ is
  located here: http://nagios.sourceforge.net/docs/2_0/commandfile.html



Object configuration files

  As mentioned, when the configuration files are split up, Nagios reads the data from these files in order for it
  to process host and service checks across the network. Before I begin, detailed documentation of all of the
  options for the template based objects are located at the website. This will help get you started though, so
  let's begin with the timeperiods file. Obviously, you can substitute your options if you want different values.


Timeperiods

  [me@mymachine nagios] vim timeperiods.cfg


  # '24x7' timeperiod definition
  define timeperiod{
             timeperiod_name 24x7
             alias                   24 Hours A Day, 7 Days A Week
             sunday                  00:00-24:00
             monday                  00:00-24:00
             tuesday                 00:00-24:00
             wednesday               00:00-24:00
             thursday                00:00-24:00
             friday                  00:00-24:00
             saturday                00:00-24:00
             }


  # 'workhours' timeperiod definition
  define timeperiod{
             timeperiod_name workhours
             alias                   "Normal" Working Hours
             monday                  08:00-17:00
             tuesday                 08:00-17:00
             wednesday               08:00-17:00
             thursday                08:00-17:00
             friday                  08:00-17:00
             }


  # 'nonworkhours' timeperiod definition
  define timeperiod{
             timeperiod_name nonworkhours
             alias                    Non-Work Hours
             sunday                   00:00-24:00
             monday                   00:00-09:00,17:00-24:00
             tuesday                  00:00-09:00,17:00-24:00
             wednesday                00:00-09:00,17:00-24:00
             thursday                 00:00-09:00,17:00-24:00
             friday                   00:00-09:00,17:00-24:00
             saturday                 00:00-24:00
             }


  # 'none' timeperiod definition
  define timeperiod{
             timeperiod_name none
             alias                    No Time Is A Good Time
             }



  You can specify as many of these as you want. For instance, say you have a need to contact folks only on
  the weekends. You can create a template "weekends" and use only Friday, Saturday, Sunday with the
  appropriate times as you see fit.


Contacts/Contacts groups

  Contacts are split into two different files. One holds the actual contact options, and the other holds contacts
  together in groups. The groups are whom you specify Nagios to contact later on.


  [me@mymachine nagios] vim contacts.cfg


  # service_notification_options are w,u,c,r,f,n
  # w=warning u=unknown c=critical r=recovery f=flapping n=none
  # host_notification_options d,u,r,f,n
  # d=down u=unreachable r=recovery f=flapping n=none


  define contact{
             contact_name                                    me
             alias                                           me
             service_notification_period                     24x7
             host_notification_period                        24x7
             service_notification_options                    c,r
             host_notification_options                       d,r
           service_notification_commands                  notify-by-email
           host_notification_commands                     host-notify-by-email
           email                                          me@myemailaddress.whatever
           }


define contact{
           contact_name                                   you
           alias                                          you
           service_notification_period                    workhours
           host_notification_period                       workhours
           service_notification_options                   c,r
           host_notification_options                      d,r
           service_notification_commands                  notify-by-email
           host_notification_commands                     host-notify-by-email
           email                                          you@youremailaddress.whatever
           }




You can choose to do as you wish, but for my purposes I only set contacts up to be notified on critical and
recovery alerts. I really have no interest in most things I'm monitoring alerting me when there may be a
temporary glitch, or when something is in a warning state, especially at 4:00 a.m. The reason I don't is
because a) I frequently check the Nagios CGI interface throughout the day, and b) all of my alerts get
forwarded to a ticketing system. With that said, I don't want unnecessary tickets being generated simply
because a plugin failed to execute this time around. If I was very inspired, I could set up a separate contact
and group to receive only the warning and unknowns, and then pipe these through a different e-mail
address. Again, completely adaptive to your needs. Also, I'm using e-mail only. My e-mail system takes
care of processing where the alerts are going. However, you could set up nagios to pipe messages straight
to pagers. Again, check the object configuration options for timeperiods.cfg on the docs. If you want to see
the commands being prosecuted for alerts, check out /etc/nagios/misccommands.cfg. On to the contact
groups.


[me@mymachine nagios] vim contactgroups.cfg


# 'einsteins' contact group definitions
define contactgroup{
           contactgroup_name                   einsteins
           alias                               einsteins
           members                             me,you
           }
  This is a simple example of contacts and contact groups. You can nest as many possibilities as you really
  want to. You can create as many contacts you need as well. It's rather straightforward.


Host and host groups

  Host and host group information is stored in the two files hosts.cfg and hostgroups.cfg. Just as you can mix
  and match contacts in various contact groups, you can do the same thing with host names in host groups. I
  prefer to create template configurations that I can leech off of later on in my configuration file. It saves you
  an incredible amount of time typing down the road.


  [me@mymachine nagios] vim hosts.cfg


  # Generic host definitions
  define host{
              name                                            generic-host            ; Generic template
  name
          notifications_enabled                               1                       ; Host
  notifications are enabled
          event_handler_enabled                               1                       ; Host event
  handler is enabled
          flap_detection_enabled                              1                       ; Flap detection
  is enabled
          process_perf_data                                   1                       ; Process
  performance data
          retain_status_information                           1                       ; Retain status
  information across program restarts
          retain_nonstatus_information    1                                           ; Retain non-
  status information across program restarts
          register                        0               ; DONT REGISTER
  THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
              }


  # This creates a generic template that any host can use.
  # Notifies never, checks 15 times before showing critical on CGI
  interface,


  define host{
              name                                basic-host
              use                                 generic-host
              check_command                       check-host-alive
              max_check_attempts                  15
              notification_interval               0
              notification_period                 none
              notification_options                n
            register                             0
            }


# This creates a generic host that your routers can use
# monitors host(s) 24x7, notifies on down and recovery, checks 15 times
before going critical,
# notifies the contact_group every 30 minutes


define host{
            name                                 your-routers-host
            use                                  generic-host
            check_command                        check-host-alive
            max_check_attempts                   15
            notification_interval                30
            notification_period                  24x7
            notification_options                 d,r
            register                             0
            }


define host{
            use                                  basic-host
            host_name                            mymachine1
            alias                                mymachine1
            address                              192.168.100.101
            contact_groups                       einsteins
#           notification_options                 d,r    #overrides the basic-host option
            }


define host{
            use                                  your-routers-host
            host_name                            router1
            alias                                router1
            address                              192.168.100.100
            contact_groups                       einsteins
            }




You can begin to see how much time predefined templates can save you down the road when adding
hosts. I'm monitoring around 100 hosts and over 200 services, so doing things the template way can really
be productive in the long haul. It can get a little confusing, but stick to the docs and you'll learn pretty
quickly. When it comes to all of the template object options each file can contain, look for
  this http://localhost(or your IP)/nagios/docs/configobject.html. This will help you tremendously, because
  there are so many options Nagios allows you to choose from. I split things up because I want not ifications
  on your-routers-host, but I don't want notification on the basic-host container. If you want to override the
  basic-host notification container, then just specify it within the host definition itself. Starting to understand
  why you use templates?


  Some people have commented that my logic here is confusing, but it will save you a ton of typing. If you
  only have a few hosts to be checking on then it probably is overkill. Ok, now on to host groups.


  [me@mymachine nagios] vim hostgroups.cfg


  define hostgroup{
              hostgroup_name           basic-clients
              alias                    basic clients
              members                  mymachine1
              }


  define hostgroup{
              hostgroup_name           your-routers
              alias                    routers
              members                  router1
              }



  That's about as simple as this can get. You specify your clients from hosts.cfg into host groups in this file.
  You can split them into multiple groups. For instance, mymachine1 can live within both the basic -clients
  and your-routers group if you so desired. Pretty simple...


Services

  To start, you're going to need at least one service to monitor. This would be a simple check -host-alive, or
  ping. Again, you can split things into templates to make it easier down the road just as demonstrated
  above.


  [me@mymachine nagios] vim services.cfg


  # Generic service definition template
  define service{
              name                                             generic-service ; Generic service
  name
          active_checks_enabled                                1                        ; Active service
  checks are enabled
        passive_checks_enabled          1                 ; Passive service
checks are enabled/accepted
        parallelize_check               1                 ; Active service
checks should be parallelized (Don't disable)
        obsess_over_service             1                 ; We should obsess
over this service (if necessary)
        check_freshness                 0                 ; Default is to
NOT check service 'freshness'
        notifications_enabled           1                 ; Service
notifications are enabled
        event_handler_enabled           1                 ; Service event
handler is enabled
        flap_detection_enabled          1                 ; Flap detection
is enabled
        process_perf_data               1                 ; Process
performance data
        retain_status_information       1                 ; Retain status
information across program restarts
        retain_nonstatus_information    1                 ; Retain non-
status information across program restarts
        register                        0                 ; DONT REGISTER
THIS DEFINITION - NOT A REAL SERVICE, JUST A TEMPLATE!
        }


# Generic for all services
define service{
        use                             generic-service
        name                            basic-service
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              15
        normal_check_interval           10
        retry_check_interval            2
        notification_interval           0
        notification_period             none
        register                        0
        }


define service{
        use                             basic-service
        name                            ping-service
        notification_options            n
        check_command                   check_ping!1000.0,20%!2000.0,60%
        register                        0
        }
define service{
            use                                              ping-service
            service_description                              PING
            contact_groups                                   einsteins
            hostgroup_name                                   basic-clients,your-routers
#           host_name                                        one_client
            }



This is the example of how to nest templates. You can use hostgroup_name or host_name individually. I've
declared a general template to use called "basic-service" which leeches off of the "generic-service"
definitions above that. Then ping-service is used to define it down even lower. The reason I split this out is
because say you want to create another host group called "your-switches," but you want notifications to go
out on this service to a different contact group. Then you just define another service definition and add this
host group to that definition, and apply a different contact group. Ultimately, the last definitions override all
other containers above it. Last man standing type deal. The last option Nagios sees, is the one it goes by.
For example below. The ping-service is still the same, but I want it to go to a different contact group. Same
logic as was explained in the hosts.cfg and hostgroups.cfg file.


define service{
            use                                              ping-service
            service_description                              PING
            contact_groups                                   group2
            hostgroup_name                                   your-switches
#           host_name                                        one_client
            }



The services.cfg file can get pretty cumbersome because of all the different checks you can configure. For
instance, you can set it up to check smtp service through check_smtp, http services through check_http,
dhcp, dns, and all sorts of items through SNMP plugins. I'll give you an example of an smtp service check.


# SMTP - ensure SMTP services are available.
define service{
            use                                              basic-service
            name                                             smtp-service
            service_description                              SMTP
            notification_interval                            15
            contact_groups                                   einsteins
            notification_options                             c,r
            notification_period                              24x7
            check_command                                  check_smtp
            register                                       0
            }


define service{
            use                                            smtp-service
            hostgroup_name                                 smtp-servers
#           host_name                                      one_client
            }



Again, obviously this leeches off the template above it, then defines the actual host groups to check. The
host group smtp-servers would have to exist in hostgroups.cfg, and there would have to be hosts defined
in hosts.cfg.


Before I continue, let me explain a bit as to what actually occurs with these files. Nagios reads the
configuration options from all of these text files. When it's time to process the smtp-service you have
defined, it looks to see what check_command it's supposed to execute. It then looks in the
checkcommands.cfg file to look up what check_smtp is supposed to actually do. This would be:


# 'check_smtp' command definition
define command{
            command_name           check_smtp
            command_line           $USER1$/check_smtp -H $HOSTADDRESS$
            }



Great it says, I've found it! Nagios now knows to go to /usr/lib/nagios/plugins/ (default path for the RPM
install) and execute the check_smtp plugin it finds there. It substitutes the $HOSTADDRESS$ with the
hosts located in the host groups, goes out and checks the server to see if SMTP is running. It then returns
back with a yay or nay, Nagios processes this information according to the options you have laid out in the
configuration files, and displays the information on the CGI interface.


This in essence is how to start setting up Nagios. I've simplified this quite a bit, but you should now have a
good understand of where to at least begin with configuring hosts and services. Look in
/usr/lib/nagios/plugins to see everything you can check out of the box. The list is very large with various
things. Also, check out http://www.monitoringexchange.org to view all sorts of third-party plugins written by
many community members. I do a lot of checks across SNMP, so be sure to check that out. Also, you can
easily write your own plugins to use. There are many extra things you can do within Nagios itself, such as
define escalations and extended service/host information. I'll explain that after you get Nagios fired up so
you can see what it's about.
Starting Nagios

  At this point, you should have a working configuration with a host or two for monitoring. Since we haven't
  done so yet, let's start the Nagios daemon, configure it to start at boot, and check the configurations file for
  errors.


  [me@mymachine nagios] chkconfig nagios on
  [me@mymachine nagios] nagios -v nagios.cfg


  Nagios 2.4
  Copyright (c) 1999-2006 Ethan Galstad (http://www.nagios.org)
  Last Modified: 05-31-2006
  License: GPL


  Reading configuration data...


  Running pre-flight check on configuration data...


  Total Warnings: 85
  Total Errors:           0


  Things look okay - No serious problems were detected during the pre-flight
  check


  [me@mymachine nagios] service nagios start


  Starting network monitor: nagios



  You'll notice my instance has 85 warnings displayed. This is because I have 85 services being checked
  that have no contact group(s) associated with the service. Warnings are usually ok to let go. As long as the
  check (nagios -v) says "Things look okay" then you're usually fine. To avoid the warnings, simply do what
  the warning says and fix the issue it's spewing.



Escalations

  Escalations are pretty cool in that they allow you to specify where second, third, fourth, and so on,
  notifications can go. For instance, you have the SMTP service set up to notify a contact group every 30
  minutes indefinitely until someone resolves the problem. With an escalation set up, you can tell
  notifications 2,3,4 to go to this e-mail address, or this pager, and then you can tell notifications 5,6,7 to go
  to yet another address or pager, and so on. I use this extensively because I have the first notification go to
  ticketing software, I then set all subsequent notifications to go to simply a pager. I don't want multiple
  tickets being created by the same incident, but I want Nagios to page the hell out of me until I respond to
  the event. Let's take a peek. This assumes you've added this in the nagios.cfg file as well as created the
  file in /etc/nagios.


  [me@mymachine nagios] vim escalations.cfg
  define serviceescalation{
               host_name                           mymachine1
               service_description                 SMTP
               first_notification                  1
               last_notification                   0
               notification_interval               30
               contact_groups                      mypager
               }



  I define host escalations and service escalations all in the same file as above. You can split these two out
  just like anything else. Just specify it in the nagios.cfg file to tell the program where the file resides. I don't
  split these because I don't have too many escalations to really be concerned with. Your mileage may vary.



Extended information

  Extended information is a bonus feature and is used mainly for just aesthetic reasons on the web interface.
  It can be split up into host extended information and service extended information. The things you can do
  with this are put pretty little icons beside host names, specify URL's to links outside of Nagios, and make
  things look "pretty" on the map systems. I use the service extended information to point to links outside of
  nagios hosting MRTG graphs. I'll show you how you can do this. Remember to specify this file exists in
  nagios.cfg and create the file.


  [me@mymachine nagios] vim serviceextinfo.cfg
  # yum's definitions
  define serviceextinfo{
               host_name                           yum
               service_description                 PING
               notes_url                           http://mynagiosbox/mrtg/myfile.html
               icon_image                          graph.gif
               icon_image_alt                      View graphs
               }



  This puts a pretty little icon beside the PING service on the web interface. When you click on this icon, it
  takes you directly to the MRTG graph I have running on the same machine. In my case, I have an int ernal
  yum server rsyncing every night to the mirrors. All of the ethernet traffic is graphed through MRTG, then
  Nagios points a link to this so it's easy to navigate to. This proves to create a good history of bandwith
  usage, and other things. Use some creativity and you can log, graph, and link to just about anything you
  want. For example, processes and users logged into a system.



Dependencies

  Another interesting file I use is the host and service dependencies options. What this does is set up a tier
  of checks before something alarms out. For example, I check a login service of a server that's not a Linux
  box. I have about 15 other services being checked on this host, but they are dependent on being able to
  login to the machine before processing these checks. When a login is unsuccessful, I don't want 15
  services to start freaking out and paging me, so I set up a dependency tree. If login fails, only the login
  alarms out...I get one notification for this, not a zillion for all the other checks. You can use t his feature for
  hosts as well. Again, specify it in the nagios.cfg file and create the files.


  define servicedependency{
              host_name                                         your_host
              service_description                               LOGIN
              dependent_host_name                               your_host
              dependent_service_description                     another_service
              execution_failure_criteria                        w,u,c
              notification_failure_criteria                     w,u,c
              }



  The execution failure criteria tells Nagios what it's supposed to do if the "LOGIN" service is down. Meaning
  "another_service" won't even bother to check the service if login is on a warning, unknown, or critical state.
  The notification failure criteria determines when notifications should not be sent out. If the login check is in
  a warning, unknown, or critical status, then no messages will be sent out on another_service.


  Just make sure when you are done adding, editing, or creating new configuration files, that you run the
  nagios -v nagios.cfg option. This processes your configuration files and does a check on them prior to
  actually refreshing the service.



SELinux

  A word about SELinux. I don't use it currently, because in 4.x, it messed with some things, and I haven't
  taken the time to learn it. I know in 5, it's supposed to be much more mature, so try it out. I turned it off
  when I verified this worked on CentOS 5, so if you run into any strange things, keep SELinux in mind. A
  security feature of CentOS 5.2 SELinux prevents the access from the apache httpd server to the needed
  /var/nagios files. A CentOS 5.2 workaround is to execute the command:
   chcon -R httpd_sys_content_t /var/nagios



That's all, folks!

   Basically, this is Nagios summed up. I'm simplifying almost everything. I hope I've explained things in a
   simple fashion anyways. Documentation for the utility is wonderful, but there's so much documentation that
   it's hard to learn where to get started sometimes. If you have a network to maintain, I advocate getting
   Nagios (unless you like other utilities) running to big brother your hosts and devices. It's saved my IT
   departments' skin on more than one occasion. Like I mentioned before, it'll take you a long time to get
   good at it, and it's not easy to figure out at first, but once you get a grip on Nagios, you'll wonder how you
   got along without it before. I'm checking everything from simple pings to check host alive status, to disk
   usage stats, memory stats, DHCP, DNS, HTTP, temperature in machines rooms, yum updates, cpu loads,
   SNMP information from hosts, to anything you can imagine. I'm leaving a lot of things out, but you get the
   idea. Virtually anything you can think of keeping an eye on, you can do so across Nagios. You can write
   your own plugins, or visit the Monitoring Exchange site I mentioned earlier to find just about anything.


   One more thing I would like to mention is the ability to configure and maintain Nagios solely through the
   web interface. Nagios doesn't come with pre-packaged add-ons for doing so, but you can find information
   for three different packages here: http://www.nagios.org/faqs/viewfaq.php?faq_id=183. I personally have
   not used any of them, but I guess for the command line challenged it could prove useful.


   If you have anything to add or if you notice something wrong, please let me know so I can correct it. The
   original is written in HTML, and I have to adapt my formatting to use on this wiki, so there might be some
   typos. Enjoy.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:64
posted:11/3/2010
language:English
pages:20