Performance Management _Best Practices_

Document Sample
Performance Management _Best Practices_ Powered By Docstoc
					Performance Management
     (Best Practices)

    REF:www.cisco.com
    Document ID 15115
Introduction

• Performance Management involves
 optimization of network response time and
 management of consistency and quality of
 individual and overall network services
  – Need to measure the user/application
    response time
Performance management issues

•   User performance
•   Application performance
•   Capacity planning
•   Proactive fault management

• It is important to note that with newer
    application like video and voice performance
    management is the key success
Indicators for performance
management (1/3)
• Document the network management
  business objectives
• Create detailed and measurable service
  level objectives
• Provide documentation the service level
  agreement (SLA) with charts or graphs
  that show the success or failure of how
  these agreements are met over the time
Indicators for performance
management (2/3)
• Collect a list of the variables for the
  baseline such as polling interval, network
  management overhead incurred, possible
  trigger threshold
• Have a periodic meeting that reviews the
  analysis of the baseline and trends.
Indicators for performance
management (3/3)
• Have a what−if analysis methodology
 documented.

• When thresholds are exceed, develop
 documentation on the methodology used to
 increase network resources.
Performance management process
flow (1/3)

         Develop a network management
              concept of operation




              Measure Performance




        Perform a Proactive Fault Analysis
Performance management process
flow (1/3)
1 develop a network management concept
  of operation
  – Define the required features : Services,
    Scalability and Availability objectives
  – Define availability and network management
    objectives
  – Define performance SLAs and Metrics
  – Define SLA
Performance management process
flow (2/3)
2 Measure Performance
  – Gather network baseline data
  – Measure availability
  – Measure response time
  – Measure accuracy
  – Measure utilization
  – Capacity planning
Performance management process
flow (3/3)
3 perform a proactive fault analysis
  – Use threshold for proactive fault management
  – Network management implementation
  – Network operation metrics
Performance management process
flow

         Develop a network management
              concept of operation




              Measure Performance




        Perform a Proactive Fault Analysis
Develop a network management
concept of operation (1/3)
• The purpose is to describe the overall
  desired system characteristics from an
  operational standpoint
• The use of this document is to coordinate
  the overall business goals of network
  operation, engineering, design other
  business units and the end users.
Define the required features: Services,
Scalability objectives (1/2)

• Define services objectives:What services the
 network provide
  – to understand applications, basic traffic flows,
   users and site counts
• Define scalability objectives: How many
 users to use the network, also the capacity
 consumed on the network
  – media capacity, number of routes and users
Define the required features: Services,
Scalability objectives (2/2)

• These are the standard performance goals:
  – Response time
  – Utilization
  – Throughput
  – Capacity (maximum throughput rate)
Define availability and network
management objectives (1/2)
• Define Availability objectives:
  – define the level of services (service level
    requirements)
  – define different class of service for a particular
    organization
     • Higher availability objective might necessitate
      increased redundancy and support procedures
Define availability and network
management objectives (2/2)
• Define manageability objectives
  – To ensure that overall network management
    does not lack management functionality
     • Must understand the process and tools for
       organization
     • Uncover all important MIB or network tool
       information
Define performance SLAs and
Metrics
• The performance SLAs metrics such as
  – average expected volume of traffic,
  – peak volume of traffic,
  – average response time and maximum
    response time allowed
  – Availability
     • Down Time
Define SLAs

• SLA (Service Level Agreement) - enterprise
• SLM (Service Level Management) – service provider
• SLM include definitions for problem types and
  severity and help desk responsibilities
  – Escalation path, time before escalation at each tier
    support level
  – Time to start work on the problem
  – Time to close target based on priority
  – Service to provide in the area of capacity planning,
    hardware replacement
Performance management process
flow
         Develop a network management
              concept of operation




              Measure Performance




        Perform a Proactive Fault Analysis
Measure Performance

• Gather Network Baseline data
  – Perform a baseline of the network before and
    after a new solution deployment
     • A typical router/switch baseline report includes
       capacity issues related to CPU, memory, buffer,
       link/media utilization, throughput
     • Application baseline: bandwidth used by app per
       time period
Measure availability

• Availability is the the measure of time for
 which a network system or application is
 available to a user
  – Coordinate the help desk phone calls with the
    statistics collected from managed devices
  – Check scheduled outages
  – Etc
Measure Response Time

• Network response time is the time required
 to travel between two points
  – Simple level – pings from the network
    management station to key points I the network.
    (not accuracy)
  – Server-centric polling : SAA (Service Assurance
    Agent) on router (Cisco) to measure response
    time to a destination device
  – Generate traffic that resembles the particular
    application or technology of interest
Measure accuracy

• Accuracy is the measure of interface traffic
 that does not result in error and can be
 expressed in term of percentage
  – Accuracy = 100 – error rate
  – Error rate = ifInErrors * 100 / (ifInUcastPkts
    + IfInNUcastPkts)
Measure Utilization (1)

• Utilization measure the use of a particular
 resource over time
  – Percentage in which the usage of a resource
    is compared with its maximum operational
    capacity
  – High utilization is not necessarily bad
  – Sudden jump in utilization can indicate
    unnormal condition
Measure Utilization (2)


• Input utilization =
  ifInOctets *8*100/(time in second)*ifSpeed
• Output Utilization
  ifOutOctets *8*100/(time in second)*ifSpeed
Capacity planning

• The following are potential areas for
 concern:
  – CPU
  – Backplane or I/O
  – Memory
  – Interface and pip sizes
  – Queuing, latency and jitter
  – Speed and distance
  – Application characteristics
Performance management process
flow

         Develop a network management
              concept of operation




              Measure Performance




        Perform a Proactive Fault Analysis
Perform a Proactive fault analysis

• One method to perform fault management
  is through the use of RMON alarms and
  event groups
• Distributed management system that
  enables polling at a local level with
  aggregation of data at a manager to
  manager
Use threshold for proactive fault
management (1/2)
• Threshold is the point of interest in specific
  data stream and generate event when
  threshold is triggered
• 2 classes of threshold for numeric data
  – Continuous threshold apply to continuous or
    time series data such as data stored in SNMP
    counter or gauges
  – Discrete threshold apply to enumerated objects
    or discrete numeric data such as Boolean
    objects
Use threshold for proactive fault
management (2/2)
• 2 different forms of continuous threshold
  – Absolute :use with gauges
  – Relative (delta): use with counter
• Step to determine threshold
  – 1 select the objects
  – 2 select the devices and interfaces
  – 3 determine the threshold values for each
    object or interface
  – 4 determine the severity for the event
    generated by each threshold
Network management
implementation
• The organization should have an
  implemented network management
  system.
• SNMP/RMON or other network
  management system tools