Docstoc

Best Practices on Campus Network Monitoring

Document Sample
Best Practices on Campus Network Monitoring Powered By Docstoc
					    Best Practices on
Campus Network Monitoring

    Ljubljana, October 20 2011
    Vidar Faltinsen, UNINETT
” We are convinced that most campuses do not     2
  take the task of measuring and understanding
  their traffic flows sufficiently seriously ”

                   - SERENATE, December 2003
Our context

   The network is complex
       A lot of equipment
       Heaps of traffic around the clock                  3

   No system is perfect
       Errors will occur – incidents will hit us
   Motto: be proactive and ahead
       The user should not call you – you should be the
        first to know!
   Keep in mind: If information is good…
     (posted at the right time, kept up to date)…
    …the user is (more) patient!
  Machine and       Report                 Device            IP Device                                 Network
                                                                                   L2 traceroute
  user trackers    Generator               History              Info                                   Explorer



 Detain machines                                                                                     Traffic Maps
                                                                                                     (geo and topo)


                                                                                                        Ranked
    Configure
                                                     NAVdb                                             statistics
   switch ports


                                                                                                        Cricket       4
                           SNMP                                          RRD
                          collector



     the network                                                                                     Alert profiles
                                 Status         Module       Service       Threshold
                                 Monitor        Monitor      Monitor        Monitor

                                                                                                        Status
…at a glance
                        SNMP
                          trap
                                                                                                         email
     external      SNMP trap                                                                             SMS
                    or email          Event Engine                       Alert Engine
     systems
                                                                                                        jabber

                                                                                          http://metanav.uninett.no
Do the most important first

For all your equipment:
  1.   Ping                   5

  2.   If down
         => send sms
Without numbers you are nothing

     When an incident occurs – do you have enough data to
      investigate – and actually pinpoint the cause?
     Disk is cheap
     Collect heaps of statistical data                              6
     Have a scheme for compressing data as time goes
             (RRD method)
     Focus on good search tools, reports and visualisation
      methods to make traffic/statistical anomalies easy to detect
         Isolation and classification of an error tends to
          consume most of the recovery time
     Autodetection of thresholds and more complex anomaly
      detection is even better
         Remember to moderate the total flow of alarms
          (classify alarms)
Logs are gold, scripts as well

   Log, log, log
        Syslog is also a management system 
                                                       7
   Small (shell) scripts can be gold
        A good idea can be only a few code lines
         away…
        A culture that motivates creativity, allows
         continous implementation of new
         scripts/add-ons will step by step improve
         the overall management process!
Avoid a monolithic NMS

   Not an absolute rule, but be a sceptic

   If the system is too massive it tends to set the agenda.
                                                               8
       You should shape the system, not the other way
         around.
       If too much resources must be invested into
         understanding the system…
       …then even more resources must be put into
         accommodating the system to your needs  

   The NMS has no intrinsic value…
        …it should be a useful tool for you

   But remember nothing is for free – you must in any
    case invest in understanding what your tools actually do
FCAPS – so many tasks to cover

      Fault management
      Configuration management   9


      Accounting management
      Performance management
      Security management
Not one tool - a set of tools
   Special purpose tools with limited scope is good
        Example of tool categories:
              inventory systems
              trouble ticket systems
              status monitors
              measurements (and threshold monitors)                           10
              server/services focused
              netflow analysis
              security-focused
              configuration tools
              simulation
   Tools should (ideally) not overlap
   Have a well defined single authority as source for your data sets, i.e.;
        the set of equipment (with attributes) we manage is defined in
         one place
        similarly for our locations (with attributes), etc, etc
   Autodetection is good
        But in a controlled environment (be aware of weak SNMPv2
         security)
            GigaCampus tool boxes
GC          Managing 30 campus networks around Norway

    The tool boxes are servers containing a number
     of management tools:
       NAV: Proactive network management
       nfsen: Netflow traffic analysis               http://metanav.uninett.no

       Stager: Netflow and Qflow
                                                                                           11
       Hobbit: Service monitoring                      Stager
       tftp server, syslog server, radius server


    The tool boxes are placed on campus and used
                                                      http://software.uninett.no/stager/
     by the local IT staff.

    Management, tool enhancements, software                nfsen
     upgrades, etc, is done by UNINETT.

    Free training in tool usage is given.            http://nfsen.sourceforge.net/




                                                      http://hobbitmon.sourceforge.net/
NAV – Network Administration Visualized
   Network management system
    developed by UNINETT and NTNU
    since 1999.

Key features
  Inventory information with topology
         topology autodetected                                            12
         L3, L2, per vlan
   Status monitor with alarm system
         sms and email alarms
   Client machine tracking
         based on ARP and bridge table data
   Client machine detention
   Statistics and graphing

Get NAV
  Free software – GPLv2
  Debian package ++
  Virtual appliance available
    http://metanav.uninett.no/navappliance



                                               http://metanav.uninett.no
The service monitor Hobbit



                                                                                13




    Agent on servers that reports on the ”local” status
    Monitors CPU load, disk usage, memory, processes
     running and whatever you script 
    Servers are organized in groups. Alarms are showed
     on a per group basis.
    Drill down to details of when an alarm occured and
     reported reason
                                            http://hobbitmon.sourceforge.net/
Use a single event/alarm system




                                  14
Place your monitor strategic

   A monitor placed in the periphery of your
    network is more likely to be cut off
       place in a central (network wise) location   15

       redundant network access (VRRP,
        HSRP…)
   Redundant power, incl redundant source of
    source (UPS/ideally standby generator)
   Monitor the monitor!
   Use SMS for alarms in addition to email
       Place the SMS sending device physically
        connected to the NMS
Adopt good naming standards

   Do not underestimate the value of sound
    names for your equipment, rooms and
    locations                                      16


   The name of the device should in itself give
    an idea of what the device is (does) and
    where it is placed
       Example: mtfs-272-sw
        (a switch in area ”mtfs”, wiring closet
        ”272”)
   Also use a thought-through naming standard
    for router interfaces and switch ports
NMS Security

   Restrict access to NMS to authorized crew
    only
       both network access and physical access   17

   Isolate management IP address of switches
    and base stations to dedicated subnets
   Firmly restrict SNMP access to the network
    equipment – only from the NMS(es).
       remember SNMP v2 security is weak
   Be even more restrictive if you allow/use
    SNMP Write
       consider SNMP v3 or Netconf
MIB requirements for your
equipment
   Your network equipment should support:

        RFC 3418: SNMPv2-MIB (system)
        RFC 2863: IF-MIB (interfaces, incl. 64 bit counters)
                                                                            18
        RFC 4293: IP-MIB (IP-interfaces and ARP; IPv4 and IPv6)
        RFC 4133: ENTITY MIB (modules, optics, software, serial numbers)
              Not supported by Juniper 
        RFC 4188: BRIDGE-MIB (bridge table)
        RFC 4363: Q-BRIDGE MIB (bridge table per vlan, vlan config)
              Not supported by Cisco 
        RFC 3635: Etherlike-MIB (duplex)
        RFC 2368: MAU-MIB (medium)
              equipment support seems scarse  (HP has support)


   Your NMS should whenever possible use standard/IETF MIBs rather
    than vendor proprietory MIBs
Key points – in summary

   Be proactive
   Detect important alarms early
                                                  19
   Inform the users
   Log, log, log (snmp collect)
   Use a number of tools
   Adopt good naming standards
   Value the engineer – small scripts are gold
   Educate your crew!
    (in both NMS operations and procedures)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:11
posted:4/21/2012
language:English
pages:19