Performance _ Monitoring - intERLab

Document Sample
Performance _ Monitoring - intERLab Powered By Docstoc
• Day 1: Performance and Monitoring
  – Li Xinman, TEIN2 NOC & CERNET NOC, PhD

• Day 2: Troubleshooting
  – Li Pengfei, CERNET NOC, CCIE

• Day 3: Emergency Response
  – Wang Yan, CERNET NOC, CCIE
Performance & Monitoring

           Li Xinman
         Sept.4-8, 2006
          AIT, Thailand
• Introduction to Performance Management
• TEIN2 NOC updates and NMS
• Performance Monitoring technologies and
• Netflow and applications
• Case Study
 Functions of Network Management

• Fault management
  – Network state monitoring
  – Failure logging, reporting and tracking etc.
• Configuration management
  – device and software configuration
  – version control (compare, apply and rollback, backup) etc.
• Accounting management
  – billing and traffic measurement etc.
• Performance management
• Security Management
  – Access control, worm/attack detection and alert etc.
 Performance Management-Why
• Why needed and important?
   – Capacity planning
        • when do we need to upgrade our link and device?
   –   Ensure network availability
   –   Verify network performance, verify QoS (we expected)
   –   Ensure SLA compliance (customer expected)
   –   Better understanding and control of network
   –   Optimization, make the network runs better!
• Murphy’s Law (also why need NOC?)
   – If Anything can go wrong, it will.
   – left to themselves, things tend to go from bad to worse.
     (The network can’t look after itself. That’s nice for us )
• Proactive or reactive?
   – Know problem before users and boss
   – Solve the problem before their complain
   – Wait for problem to happen, and customers complain?

   – As a NOC, we should be proactive, NOC means NO Complain!
Performance Management-What
• What’s performance management?
  – understanding the behavior of a network and its
    elements in response to traffic demands
  – Measuring and reporting of network performance
    to ensure that performance is maintained at a
    acceptable level
 Performance Management-How
• How to measure the network performance
  – Delay, jitter, packet loss, bandwidth usage etc.
• The steps and process of performance management:
  –   Data collection
  –   Baseline the network
  –   Determining the threshold for acceptable performance
  –   Tunning
• Technologies and tools needed
  – Data collection technologies such as: sniffing & netflow
  – QoS
  – Tools: ping, mrtg, iperf, wget, etc.
                Delay (Latency)
• Delay = propagation delay + serialization delay
• Propagation delay: the time it takes to the physical
  signal to traverse the path; depends on distance.
  (add 6 ms for 1000km Fibre link)
   – The delay from Beijing to Guanzhou is about 34 ms
     (CERNET), the distance is about 3000Km.
• Serialization delay is the time it takes to actually
  transmit the packet; caused by intermediate
  networking devices, includes queuing, processing
  and switching time (normally, less than 1ms for one
  networking devices, but not firewalls or heavily
  loaded routers)
• Comfortable human-to-human audio is only possible
  for round-trip delays not greater than 100ms

• Tools: ping, traceroute etc.
• is the variation of the delay, a.k.a the 'latency variance,' can
  happen because:
    – variable queue length generates variable latencies
    – Load balancing with unequal latency
• In general, higher levels of jitter are more likely to occur on
  either slow or heavily congested links. It is expected that the
  increasing use of “QoS” control mechanisms such as class
  based queuing, bandwidth reservation and of higher speed
  links such as 100M Ethernet
• Harmless for many applications but real-time applications as
  voice and video
• Applications will need jitter buffer to make it smoothly
• Tolerable Jitter range for VOIP is: 20ms – 30ms

• Tools: ping etc.    J1 = abs(t2-t1), J2=abs(t3-t2), ….
                    Packet Loss
• Loss of one or more packets, can happen because ...
   – Link or hardware caused CRC error
   – Link is congested or queue is full (tail drop or even
   – route change (temporary drop) or blackhole route
     (persistent drop)
   – Interface or router down
   – Misconfigured access-list
   – ...
• 1% packet loss is terrible and unusable!

• Tools: ping etc.
         Bandwidth Utilization
• Capacity plan: decide when to upgrade the link, but
  maybe investment depended
• Better less than 35% (and commercial ISPs do)
• For CERNET, most links are above 70%, some above
  95%, in our theory, for E&R networks, 70% is
• For TEIN2 now, most links are below 15% !!

• Tools: MRTG, SNMP tools, telnet etc.
               Network Availability
•   is the metric used to determine uptime and downtime
•   Availability = (uptime)/(total time) = 1-(downtime)/(total time)
•   Network availability is the IP layer reachability
•   Better > 99.9%
•   99.9%
     – 30x24x60x0.1%=43.3 (Minutes), means the down time should be less
       than 45 minutes in one month
• 99.99%
     – 30x24x60x0.01%=4.3 (Minutes), means the down time should be less
       than 5 minutes in one month!
• 99.9% is acceptable for R&E networks (Even 99.0% is acceptable),
  some commercial ISPs can reach 99.99%
• The network devices should be 99.999% available or as specified,
  but it’s not the truth even the top venders
    Packets Per Second (PPS)
• Important for performance: network
  performance is highly affected by PPS, such
  as delay or packet loss, because the
  serialization delay will increase because of
  the load of the intermediate routers
• PPS is a very important metric to detect
  DOS/DDOS traffic
  – E.g. normally, the pps of one GE link is about
    100,000 (baseline), if raised to 200,000 pps
    sharply, then it means DOS.

• Easy to get: show interface
   CPU and Memory Utilization
• We focus on routers
• CPU utilization better less than 30%
• For global routing routers, at least 512M
  memory is needed
• QoS: Quality Of Service
• QoS is technology to manage network
• QoS is a set of performance measurements
  – Delay, Jitter, packet loss, availability, bandwidth
    utilization etc.
• IP QoS: QoS for IP service
                QoS Architecture
• Best Effort
• IntServ
   –   End to end, session state needed
   –   RSVP
   –   CPU and Memory intensive
   –   Difficult to deploy
   –   Not scalable
• DiffServ
   – PHB: Per-Hop-Behavior, Not end-to-end
   – Scalable
   – Easy to deploy
• What is using now: DiffServ + IP, DiffServ + MPLS
• If network bandwidth is enough, there is no need for
QoS Practice: Traffic Shaping (rate-limit)

 • 40Mbps for all outbound traffic
    interface FastEthernet2/0
    rate-limit output 40000000 400000 400000 conform-action
      transmit exceed-action drop
 • 40Mbps for specific traffic through ACL
      interface FastEthernet2/0
        rate-limit output access-list 110 40000000 400000 400000
          conform-action transmit exceed-action drop
        access-list 110 deny tcp any any eq www
        access-list 110 deny tcp any eq www any
        Access-list 110 permit ip any any
QoS Practice: Modular QoS Command

1) Classify the traffic, definition of traffic
    class-map match-any limit-campus
       match access-group 170
2) Define the traffic policy
    policy-map limit-30M
     class limit-campus
        police 30000000 30000 30000 conform-action transmit
3) Apply the traffic policy
    interface GigabitEthernet5/2
       service-policy input limit-34M
       service-policy output limit-34M
Traffic classification example
                SLA and QoS
• SLA: Service Level Agreement
• SLA is the agreement between service provider and
  customer, SLA defines the quality of the service the
  service provider delivered, such as delay, jitter,
  packet loss etc.
• SLA is a very important part of the business contract,
  and also can be used to distinguish the service level
  of different ISPs

            Business          Technology

              SLA               QoS
               SLA example: Level 3


Packet Loss

      SLA example: Sprintlink

                 Delay             Availability   Jitter
North America    55 ms    0.30%      99.90%       2 ms
   Europe        44 ms    0.30%      99.90%       2 ms
    Asia         105 ms   0.30%      99.90%       2 ms
South pacific    70 ms    0.30%      99.90%       2 ms
Continental US
                 55ms     0.1%         n/a        2 ms
 (Peerless IP)
        Measurement Technology
• We’ve known what metrics used to describe
  network performance, but how to measure them?
• Technologies and tools
  –   ping, traceroute, telnet and CLI commands etc.
  –   SNMP
  –   Netflow (Cisco), Sflow (Juniper), NetStream (Huawei)
  –   IP SLA (Cisco)
  –   Etc.
• Normally used as a troubleshooting tool
• Uses ICMP Echo messages to determine:
  – Whether a remote device is active (for trouble shooting)
  – round trip time delay (RTT), but not one-way delay
  – Packet loss
• Sometime we need to specify the source and length
  of packet using extended ping in router or host
  – Why using large packet when ping?
     (to test the link quality and throughput.)
  – Large packet ping is prohibited in Windows, but Linux is ok
                   Sample Ping
Freebsd>% ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=253 time=0.326 ms
64 bytes from icmp_seq=6 ttl=253 time=0.288 ms
6 packets transmitted, 6 received, 0% packet loss, time 4996ms
rtt min/avg/max/mdev = 0.239/0.284/0.326/0.025 ms

router# ping
Protocol [ip]:
Target IP address:
Repeat count [5]:
Datagram size [100]: 3000
Timeout in seconds [2]:
Extended commands [n]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 5, 3000-byte ICMP Echos to, timeout is 2 seconds:
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/4 ms
• Can be used to measure the RTT delay, and also the
  delay between the routers along the path
• Unix/linux traceroute uses UDP datagram with
  different TTL to discover the route a packet take to
  the destination, Microsoft Windows tracert uses
  ICMP protocol, If Windows tracert appears to show
  continuous timeouts, the router may be filtering
  ICMP traffic – try a Unix/Linux traceroute
• After the Nachi worm, many ISPs filter ICMP traffic.
  So ping can not work, but traceroute is ok


            2ms              15ms             2ms

      H1          router1           router2         router3
          Sample Traceroute

Router# traceroute
Type escape sequence to abort.
Tracing the route to

 1    0 msec    0 msec 0 msec
 2    20 msec   20 msec 16 msec
 3    28 msec   28 msec 24 msec
 4    24 msec   *       24 msec
                    Visual Route
• Visualization of traceroute information
      telnet and CLI commands
• Using telnet manually or scripts programmed with
  Expect to telnet the network device then issue the
  CLI commands is also a useful and basic monitoring
  method to get performance data
• It’s necessary because some data can only be
  accessed through CLI commands, and not
  supported by SNMP etc. How about config file?
                       Show interface
• Bandwidth utilization information, PPS etc
• Examples
    – show interface GigaEthernet2/24
    GigabitEthernet2/24 is up, line protocol is up (connected)
     Description: to-tein2-xing-20060119
     Internet address is                     13% and 5.5%
     MTU 9216 bytes, BW 1000000 Kbit, DLY 10 usec,
       reliability 255/255, txload 33/255, rxload 14/255
     Input queue: 0/75/1/0 (size/max/drops/flushes); Total output drops: 0
     Queueing strategy: fifo
     Output queue: 0/40 (size/max)
     5 minute input rate 55010000 bits/sec, 17367 packets/sec
     5 minute output rate 133299000 bits/sec, 18476 packets/sec
     L2 Switched: ucast: 235554 pkt, 32942922 bytes - mcast: 44728 pkt, 4631058 bytes
     L3 in Switched: ucast: 7786262800 pkt, 2957731471301 bytes - mcast: 0 pkt, 0
       bytes mcast
     L3 out Switched: ucast: 8883546304 pkt, 7850287572491 bytes mcast: 0 pkt, 0
    – ......
• It’s better not to change the bandwidth setting (even for ospf metric)
           Show process cpu/mem
• Measure the usage of CPU and memory
• router1>sh proc cpu
   CPU utilization for five seconds: 2%/0%; one minute: 5%; five minutes: 5%
    PID Runtime(ms) Invoked        uSecs 5Sec 1Min 5Min TTY Process
      1      8     91       87 0.00% 0.00% 0.00% 0 Chunk Manager
      2   5876 4393609           1 0.00% 0.00% 0.00% 0 Load Meter
      3   1400 200869           6 0.00% 0.00% 0.00% 0 BGP Open
      4      0      1       0 0.00% 0.00% 0.00% 0 EE48 TCAM Carve
      5 50811784 2895942         17545 0.00% 0.25% 0.22% 0 Check heaps

• Sometime, the CPU usage of the processes ‘IP input’
  and ‘BGP Scanner’ will be very high
• Remember don’t run out the telnet session number!
  Else you will be keep out of the router.
• SNMP is a Internet standard management
  framework that provides facilities for managing
  and monitoring network resources on the
• Components of SNMP
   – MIB: managed information base
   – SNMP Agent: software runs on network device to
     maintain MIB
   – SNMP manager: application program contacts agent to
     query or modify the MIB at agent
   – SNMP Protocol: is the application layer protocol used
     by SNMP agents and managers to send and receive
     data, the data is encoded in BER
   – SMI: Structure and Syntax of Management Information,
     standard defines how to create a MIB
SNMP Architecture
• A MIB specifies the managed objects
• MIB is a text file that describes managed objects
  using the syntax of ASN.1 (Abstract Syntax Notation 1)
• ASN.1 is a formal language for describing data and its

• In Linux, MIB files are in the directory
   – Multiple MIB files
   – RFC1213-MIB.txt, MIB-II (defined in RFC 1213) defines the
     managed objects of TCP/IP networks
               Managed Objects
• Each managed object is assigned an object identifier
• The OID is specified in a MIB file.
• An OID can be represented as a sequence of integers
  separated by decimal points or by a text string:
   –      (looks like IPv6 address? )

• When a SNMP manager requests an object, it sends
  the OID to the SNMP agent.
    Organization of Managed Objects
• Managed objects are                                        .              root

  organized in a tree-like
  hierarchy and the OIDs                                 iso(1)

  reflect the structure of the
                                                         org (3)
• Each OID represents a node                            dod (6)
  in the tree.
• The OID                                internet (1)

  (      directory (1)        mgmt (2)         experimental (3)        private (4)
  mib-2) is at the top of the
  hierarchy for all managed
                                                        mib-2 (1)
  objects of the MIB-II.
• Manufacturers of networking
                                 system (1)
  equipment can add product                         at (3)            icmp (5)         udp (7)           snmp (11)

  specific objects to the               interface (2)        ip (4)          tcp (6)         egp (8)           transmission (10)
                                                   ipForwDatagrams (6)
Definition of Managed Object in a MIB

     –   String that describes the    Standard MIB Object:
         MIB object.
     –   Object Identifier (OID) sysUpTime OBJECT-TYPE
2. SYNTAX                               SYNTAX Time-Ticks
   – Defines what kind of info          ACCESS read-only
     is stored in the MIB object
                                        STATUS mandatory
     WRITE                                  “Time since the network
4. STATUS                                   management portion of
     –   State of object in regards              the system was last re-
         the SNMP community                      initialised.”
5.   DESCRIPTION                             ::= {system 3}
     –   Reason why the MIB
         object exists
IF-MIB (64-bit counters)
                    SNMP Protocol
• C/S based, Client Pull and Server Push
• Ports: UDP 161(snmp messages), UDP 162(trap messages)
• SNMP manager and an SNMP agent communicate using the SNMP
   – Generally: Manager sends queries and agent responds
   – Exception: Traps are initiated by agent.
             SNMP Functions
1. Get-request. Requests the values of one or more
2. Get-next-request. Requests the value of the next
   object, according to a lexicographical ordering of
3. Set-request. A request to modify the value of one or
   more objects
4. Get-response. Sent by SNMP agent in response to
   a get-request, get-next-request, or set-request
5. Trap. An SNMP trap is a notification sent by an
   SNMP agent to an SNMP manager, which is
   triggered by certain events at the agent
• Traps are triggered by an event
• Defined traps include:
   –   linkDown: Even that an interface went down
   –   coldStart - unexpected restart (i.e., system crash)
   –   warmStart - soft reboot
   –   linkUp - the opposite of linkDown
   –   (SNMP) AuthenticationFailure
   –   …
• Traps can be received by a management application,
  and handled in several ways: logging, paging,
  alerting, or completely ignore 
                   SNMP Versions
• Three versions are in use today:
   – SNMPv1 (1990)
   – SNMPv2c (1996)
      • Adds “GetBulk” function and some new data types (such as 64 bit counters)
      • Adds RMON (remote monitoring) capability
      • The only version endorsed by IETF but not others as SNMPv2u and SNMPv2*
        with security features.
   – SNMPv3 (2002)
      • SNMPv3 started from SNMPv1 (and not SNMPv2c)
      • Addresses security

• All versions are still used today, but version 1&2 are
  most commonly used, don’t bother version 3 if not
• Many SNMP agents and managers support all three
  versions of the protocol
      SNMP Community Strings
• Like passwords
• Two kinds:
   - READ-ONLY: You can send out a Get & GetNext to the
     SNMP agent, and if the agent is using the same read-only
     string it will process the request.
   - READ-WRITE: Get, GetNext, and Set. If a MIB object has an
     ACCESS value of read-write, then a Set PDU can change the
     value of that object with the correct read-write community
• Default community string: public (read), private
• Keep the R/W community string secret ! In the fact,
  RW comnunity is not so necessary!
                 SNMP Security
• SNMPv1 uses plain text community strings for
  authentication as plain text without encryption
• SNMPv2 was supposed to fix security problems, but
  effort de-railed (The “c” in SNMPv2c stands for
• SNMPv3 has numerous security features: Integrity,
  authentication and privacy
  – Instead of granting access rights to a community, SNMPv3
    grants access to users
  – Access can be restricted to sections of the MIB (View based
    Access Control Module (VACM). Access rights can be
     • by specifying a range of valid IP addresses for a user or
     • or by specifying the part of the MIB tree that can be accessed
           SNMP Configuration
• Configuring SNMP access
   snmp-server community notpublic ro
   snmp-server community topsecret rw 60
   access-list 60 permit
   access-list 60 permit
• Configuring Traps
   snmp-server host public
   snmp-server enable traps
   snmp-server enable traps bgp
   snmp-server enable traps snmp bgp
   snmp-server trap-source loopback 0
• About View (for security)
   Snmp-server view testview included      (mib-2)
   Snmp-server view testview included   (cisco)
   Snmp-server community test1 testview ro 60
       ifIndex – Interface Name?
• Ifindex is the unique value to identify interface of a
• show snmp mib ifmib ifindex interface
   – to show the ifindex of interfaces, e.g.
      (router)#sh snmp mib ifmib ifindex pos9/0
                Interface = POS9/0, Ifindex = 28
   – Or snmpwalk?

• Most management software using ifIndex for data
  collection and monitoring, such as MRTG, for SNMP,
  it’s a part of an OID
• But it will change after router reboot
• snmp-server ifindex persist
   – Keep from changing when reboot
               System MIB (MIB-II)





                        MIB instances
• Each MIB can have an instance, some will have more
• A MIB for a router’s (entity) interface information:
  iso(1) org(3) dod(6) internet(1) mgmt(2) mib-2(1) interfaces(2) ifTable(2) ifEntry(1)

• Require one ifEntry value per interface (e.g. 3)
• One MIB object definition can represent multiple
  instances through Tables, Entries, and Indexes

                          ENTRY + INDEX = INSTANCE

                           ifType(3)        ifMtu(4)         Etc…

             Index #1       ifType.1[6]      ifMtu.1
             Index #2       ifType.2:[9]     ifMtu.2
             Index #3       ifType.3:[15]    ifMtu.3
        SNMP Operation: snmpget
•   Example 1:
    –   MIB:
    –   Results:
         $ snmpget -v 1 test888 .
         system.sysDescr.0 = Cisco Internetwork Operating System Software
         IOS (tm) C2600 Software (C2600-I-M), Version 12.2(11)T3, RELEASE SOFTWARE
         TAC Support:
         Copyright (c) 1986-2002 by cisco Systems, Inc.
         Compiled Sun 22-Dec-02 02:49 by ccai
•   Exmple 2:
    –   MIB:
    –   Results:
         $ snmpget -v 2c test888 .
         system.sysUpTime.0 = Timeticks: (494755800) 57 days, 6:19:18.00
     SNMP Operation: snmpset
•   MIB
•   Operation
    $ snmpget -v 1 write888 .
    system.sysContact.0 = test
    $ snmpset -v 1 write888 . s
       "CERNET NOC"
    system.sysContact.0 = CERNET NOC
    $ snmpget -v 1 write888 .
    system.sysContact.0 = CERNET NOC
    SNMP Operation: snmpwalk
•   MIB
•   Operation
    $ snmpwalk -v 2c test888 .
    system.sysDescr.0 = Cisco Internetwork Operating System Software
    IOS (tm) C2600 Software (C2600-I-M), Version 12.2(11)T3, RELEASE
        SOFTWARE (fc2)
    TAC Support:
    Copyright (c) 1986-2002 by cisco Systems, Inc.
    Compiled Sun 22-Dec-02 02:49 by ccai
    system.sysObjectID.0 = OID: enterprises.9.1.208
    system.sysUpTime.0 = Timeticks: (494811433) 57 days, 6:28:34.33
    system.sysContact.0 = "CERNET NOC, 86-10-62784048"
    system.sysName.0 = cernoclab
    system.sysLocation.0 = "THU Main Building Room306"
    system.sysServices.0 = 78
    system.sysORLastChange.0 = Timeticks: (0) 0:00:00.00
SNMP Operation: snmpbulkget
•   MIB
•   Operation
    $ snmpbulkget -v 2c -B 0 10 test888 .
    system.sysDescr.0 = Cisco Internetwork Operating System Software
    IOS (tm) C2600 Software (C2600-I-M), Version 12.2(11)T3, RELEASE
        SOFTWARE (fc2)
    TAC Support:
    Copyright (c) 1986-2002 by cisco Systems, Inc.
    Compiled Sun 22-Dec-02 02:49 by ccai
    system.sysObjectID.0 = OID: enterprises.9.1.208
    system.sysUpTime.0 = Timeticks: (494914259) 57 days, 6:45:42.59
    system.sysContact.0 = CERNET NOC
    system.sysName.0 = cernoclab
    system.sysLocation.0 = "THU Main Building Room306"
    system.sysServices.0 = 78
    system.sysORLastChange.0 = Timeticks: (0) 0:00:00.00
    interfaces.ifNumber.0 = 3
    interfaces.ifTable.ifEntry.ifIndex.1 = 1
Interface MIB (MIB-II, 32bit counters)
 Interface MIB (MIB-II) Operation
$ snmpget -v 2c test888 .
interfaces.ifTable.ifEntry.ifDescr.1 = FastEthernet0/0

$ snmpget -v 2c test888 .
interfaces.ifTable.ifEntry.ifInOctets.1 = Counter32: 2984051368

$ snmpget -v 2c test888 .
interfaces.ifTable.ifEntry.ifOutOctets.1 = Counter32: 490955885
              Cisco Interface MIB
     Cisco Interface MIB Operation
• Operation
   $ snmpget -v 2c 202.112.xx.xx public .
   enterprises. = "bj-a1 to bj1 10G"
   $ snmpget -v 2c 202.112.xx.xx public .
   enterprises. = "C6k 10000Mb 802.3"
   $ snmpget -v 2c 202.112.xx.xx public .
   enterprises. = 1179992000
   $ snmpget -v 2c 202.112.xx.xx public .
   enterprises. = 1835180000
• Show interface
   bj-a1-bgw#sh int te7/3
   TenGigabitEthernet7/3 is up, line protocol is up (connected)
    Hardware is C6k 10000Mb 802.3, address is 0014.a9f7.be80 (bia
    Description: bj-a1 to bj1 10G
    5 minute input rate 1177610000 bits/sec, 327712 packets/sec
    5 minute output rate 1835759000 bits/sec, 358057 packets/sec
• Remote Monitoring Specification: provides standard
  information that a network administrator can use to
  monitor, analyze, and troubleshoot a group of
  distributed local area networks (LANs) and
  interconnecting lines from a central site
• RMON is for traffic management
• specified as part of the MIB and an extension of
• the latest level is RMON Version 2 (referred to as
  "RMON 2" or "RMON2")
• RMON can be supported by hardware monitoring
  devices (known as "probes") or through software or
  some combination
                Diagram of RMON MIB
   ISO         Org                                RMON
                     DoD            RMON1                      RMON2
        Mgmt         Private      1. Statistics          11. Protocol Directory
MIB 1&2                              2. History          12. Protocol Distribution
                                      3. Alarm           13. Address Map
                                      4. Hosts           14. Network-Layer Host
MIB 1                           5. Host Top N            15. Network-Layer Matrix
                                     6. Matrix           16. Application-Layer Host
MIB 2                                 7. Filter          17. Application-Layer Matrix
                                   8. Capture            18. User History
                                      9. Event           19. Probe Configuration
                               10. Token Ring            20. RMON Conformance
            RMON MIB Groups
Ê Statistics - Traffic and error rates on a segment
Ë History - Above statistics with a time stamp
Ì Alarm - User defined threshold alarms on any RMON variable
Í Hosts - Traffic and error rates for each host by MAC address
Î Host Top N - Sorts hosts by top traffic and/or error rates
Ï Matrix - Conversation matrix between hosts
ÐFilter - Definition of what packet types to capture and store
ÑPacket Capture - Creates a capture buffer on the probe that
  can be requested and decoded by the management application
ÒEvent - Generates log entries and/or SNMP traps
ÓToken Ring - Token Ring extensions, most complex group
RMON2 is standard for monitoring higher protocol layers.

      Session                              RMON2
      Data Link
      Physical                              RMON
                  SNMP Tools
• CLI Commands
  – Snmpget, snmpset, snmpwalk, snmpbulk, etc
• MIB Browser
  – iReasoning, solarwinds etc
• Large Applications: Network Management
  –   HP OpenView
  –   IBM Tivoli (netview)
  –   Sun NetManager
  –   Etc.
  Commercial SNMP Applications
•                           HP OpenView
•                                   IBM NetView
•               Novell ManageWise
•                             Sun MicroSystems Solstice
•                        Microsoft SMS Server
•       Compaq Insight Manger
•                                    SnmpQL - ODBC Compliant
•                               Empire Technologies
•                    Cinco Networks NetXray
•                    SNMP Collector (Win9X/NT)
•                Observer
•   Gordian’s SNMP Agent
•                               Castle Rock Computing
•                                Advent Network Management
•                                  SimpleAgent, SimpleTester
SNMP Tools-GUI (MIB Browser)
• The Multi Router Traffic Grapher: a freeware
  written in Perl, works on unix/linux, graph
  data collected from routers and other
  devices or applications based on SNMP.
• One of most popular network monitoring
  tools used today: to monitoring the
  bandwidth utilization of network link
• SNMP v2c support, no more counter
           Configuration of MRTG
• cfgmaker to generate a configuration file and tune
   cfgmaker public@ | tee test.cfg
• Setting up crontab in (/etc/crontab), runs every 5 minutes
   */5 * * * * wang /usr/bin/mrtg /home/wang/mrtg/test1.cfg
• Two basic object types in MRTG
   – Counter: object that returns an unsigned integer that grows
     over time
   – Gauge: A gauge integer will go up an down according the
     variable it tracks
      Options[_]: gauge, growright
• Enable snmpv2c:
   Target[]: 28:test888@        Version 1 (default)
   Target[]: 28:test888@   Version 2c
MRTG Example
Bandwidth Utilization Monitoring
Delay & Packet Loss

• Client/server application that
  –Measures maximum TCP performance
  –Facilitates tuning of TCP and UDP parameters
  –Reports bandwidth, jitter, and packet loss
Performance Management Process




         Performance Matrix
•   Traffix Matrix
•   Delay Matrix
•   Packet Loss Matrix
•   …….
Distributed Backbone Performance
      Monitoring Architecture



  Performance data collection agents in infrastructure
         Data Collection Agent
• Routers?
  – Embedded: If the router is strong enough, it’s ok
  – Dedicated routers: Shadow Router
     • Cisco 26xx/28xx is enough
     • Steady and easy to deploy
     • Mature software solutions
• Servers?
  – Embedded: If the load of the server is not heavy, it’s good
  – Dedicated Servers: Test Server
     • Flexible: monitoring anything as you like
     • Easy: Free tools is quite enough
         – Ping, traceroute, iperf, wget, beacon etc.
     • Low Cost: a normal 1U PC server is not as expensive as a
Cisco Performance Measure
        Introduction of IP SLA
• Allow users to monitor network performance
  between Cisco routers or from either a Cisco
  router to a remote IP device.
• Embedded within Cisco IOS software and
  there is no additional device to deploy, learn,
  or manage.
• A dependable, a scalable, cost-effective
  solution for network performance
• Collect network performance information in
  real time: response time, one-way latency,
  jitter, packet loss, voice quality measurement,
  and other network statistics.
 Multi-Protocol Measurement and
Management with Cisco IOS IP SLAs
CERNET: Data Collection Agents Distribution
                                National Center


                      Console               Agent
                      Server         Core

       Core                                 PoP     Core
      Access                                        Access
                       ……                                  Agent
Tools and Technologies Used
   •   Ping
   •   Traceroute
   •   Snmp
   •   telnet
   •   FreeBSD
   •   Perl
   •   Rrdtools, GD
   •   Multicast beacon
   •   Iperf
   •   Etc.
Performance Metric Example: Packet Loss
Performance Metric Example: Delay
Performance Metric Example: Multicast
              Thank You!
• Some materials are from network, thanks
  goes to the authors!

Shared By: