Slide 1

Document Sample
Slide 1 Powered By Docstoc
					       Part Five
Incident Detection and
      Recording


                         1
    IT Incident Management




2
  IT Incident Management
aims to minimize disruption to the business
by restoring service operation to agreed
levels as quickly as possible

the process instigated when introducing the
IT quality framework to a Service Desk, and

 offers the most immediate and highly visible
cost reduction and quality gains.


                                                3
The relationship of Incident
 Management to other IT
         processes




                               4
     Problem Management
assists Incident Management by:
providing the next path to escalation and
resolution (part of the Incident Lifecycle)

establishing root cause and Known Errors

supporting Incident Management in restoring
services

providing management reporting on historical
data and trend analysis.
                                               6
       Change Management
  assists Incident Management by:

providing the Service Desk with information
on current and future change activity, as well
as change history

providing controlled implementation of
changes

providing up-to-date information to customers
on progress of change.

                                                 7
    Configuration Management
  assists Incident Management by:
providing valuable information on how much of
the IT infrastructure is affected, the
Configuration Item (CI)

relationships and the dependencies of other CIs

providing up-to-date information on customers,
owner and status of CIs

assisting with identification of incidents of
similar CI type.

                                                  8
    Service Level Management
  assists Incident Management by:

providing performance metrics on incident
response and resolution times

establishing a contact point for customers
when escalations are breached.




                                             9
       Incident detection
         and recording
Detection and recording · Classification and
initial support

Investigation and diagnosis · Resolution and
recovery

Incident closure, Incident ownership,
monitoring, tracking and communication



                                               11
         Incident detection
           and recording
Throughout an Incident lifecycle, specialist
 IT groups will handle the incident at
 different stages.

To do this efficiently and effectively, a formal
 approach is required which will facilitate
 the timely restoration of service following
 an incident.

                                               12
        Incident detection
          and recording
As shown, incidents come from many
sources. The Service Desk (more commonly
known as a Help Desk) is the primary point
for recording incidents, although other IT staff
can play this role as well.

The Service Desk is the single point of
contact between service providers and users
or their representatives on a day-to-day basis
and typically the owner of the Incident
Management process.

                                               13
       Incident detection
         and recording
Incident tracking is highly recommended.
Information about incidents should be held
in the same Service Management tool as
the problem and change records.
Records should be cross-linked to
eliminate the need for re-keying data.
This improves information interfaces and
makes data interrogation and reporting
much easier.
                                         14
   Incident detection and
         recording

Incident priorities and escalation
procedures need to be agreed as part of
the Service Level Management process
and documented in SLAs.




                                          15
  Classification

(Priority, Impact &
    Urgency).


                      17
                  Priority
One of the important aspects of managing
 an Incident is to define its priority.

How important is IT and what is the
 impact on the business?

The responsibility for this definition lies with
 the Service Level Management process.
                                                   18
                   Priority
The priority with which Incidents need to be
 resolved, and therefore the amount of
 effort put into the resolution of and
 recovery from Incidents will depend upon:

  1.   The impact on the business
  2.   The urgency to the business
  3.   The size, scope and complexity of the
       Incident
  4.   The resources availability for coping in
       the meantime and for correcting the fault.

                                                19
                   Impact
'Impact' is a measure of the business criticality
   of an Incident or Problem.
Often this equates to the extent to which an
   Incident can lead to degradation of agreed
   service levels.
Impact is often measured by the number of
   people or systems affected.
 Criteria for assigning impact should be set up
   in consultation with the business managers
   and formalized in SLAs.
                                                    20
                  Impact

When determining impact, information in the
 Configuration Management Database
 (CMDB) should be assessed to detect how
 many users will suffer as a result of the
 technical failure , for example, a hardware
 component.




                                               21
                      Impact
The Service Desk should have access to tools that
  enable it too rapidly:

       Assess the impact of significant equipment failures
       on users

       Identify users affected by equipment failure

       Establish contact with users to make them aware of
       the issue

       Provide a prognosis

       Alert second-line (specialist) support groups, if
       appropriate
                 Urgency
'Urgency' is about the necessary speed in
  solving an Incident of a certain impact. A
  high-impact Incident does not, by default,
  have to be solved immediately.
For example, a User having operational
  difficulties with his workstation (impact
  'high') can have the fault registered with
  urgency 'low' if he is leaving the office for a
  holiday after reporting the Incident.

                                                23
                 Urgency
'Urgency‘ is assessed by what degree the
   service is affected (stopped, partially
   affected, functionally changed).
If a user calls with an Incident and he/she
   can’t work (service stopped) then it is of
   greater urgency than a user calling to
   request a functionality change.


                                                24
     Investigation and diagnosis
Once logged, the activity of investigation
   and diagnosis will take place.
If the Service Desk can’t solve an Incident, it
   will be assigned to other support levels.
They will then investigate the Incident using
   the available skills and diagnose the
   problem.



                                              26
       Resolution and Recovery

Once a solution to the incident is found, it
   will be implemented.
If a change is needed, a request for change
   will be submitted to Change Management.




                                           27
          Incident Closure



For the Incident Management
 process to be effective, it is
 necessary that the Incident’s
 closure be done properly.



                                  28
            Incident Closure

To ensure the solution provided meets the
 user needs, he/she is the only person who
 can give the authority to close an Incident.

The Incident record in the Service Desk tool
 should be ‘closed’ so that accurate
 reporting can be carried out.


                                               29
            Incident Closure

An Incident will be closed as soon as the
 agreed service is restored.

In some cases the Incident record is closed
  but a Problem record is still open




                                              30
   Incident ownership, monitoring,
     tracking and communication

Whilst an Incident may be passed across
   different IT groups during investigation and
   diagnosis, the Service Desk remains the
   owner of the Incident (in terms of tracking
   through to closure).
It will monitor the progress of the Incident in
   light of service levels and
   maintain/manage communication with the
   user.
                                              31
   Incident ownership, monitoring,
     tracking and communication

If the Incident is not progressing
   appropriately, then the Service Desk
   may trigger either a functional or
   hierarchical escalation.




                                          32
   Incident ownership, monitoring,
     tracking and communication

A functional escalation is where the incident
  is passed to a different part (or function)
  within the organization.
As the name implies a hierarchical
  escalation sees the incident bought to the
  attention of a person that holds a position
  of supervision/management


                                                33
                 Benefits
A well implemented Incident Management
  process will have easily visible benefits.
Unlike some other IT processes where
  benefits may be hard for end users to
  identify, the benefits of good incident
  management will be felt by them directly.




                                               34
Benefits for Customers

Restores service quickly following an
Incident
Ensures Incidents are not lost or forgotten
Provides up to date status of their Incident




                                           35
       Benefits for the IT
         organization
   Removes likely sources of “duplication of
    effort” (once an Incident is solved, the
    resolution will be easily found and can be
    applied to future similar incidents).
   Provides a clear view of the status and
    priorities of the Incidents
   Measures performance against SLAs
    where possible
   Gives higher user and customer
    satisfaction
                                                 36
         Benefits for the IT
           organization
The whole question of being able to use information
  from previously solved incidents is one that should
  be discussed in your organization.
Once an incident is solved and underlying causes
  are removed (through Problem Management) we
  shouldn’t see that incident occur again.
Most re-work is done because the incident record
  didn’t contain information or key words that the
  person faced with a similar issue in the future
  would use to search on.

                                                    37
      Benefits for the IT
        organization
Lays down rules for prioritization; the high
impact, high urgency incidents are the
ones that jump to the front of the queue
resulting in the least possible impact on
the business activities.
Ensures quicker resolution of Incidents
(productivity gains).
Provides Management information
                                               38
        Benefits for the IT
          organization
Defining benefits is relatively easy. Realizing
   benefits is difficult.
It is human nature to expect a quick
   response to a reported incident.




                                              39
          Benefits for the IT
            organization
This is where the communication skill of IT Staff must be
  well practiced.

Careful selection of the words to use to convey this
  message can be learned; for instance, acknowledge
  the frustration they are facing and provide a very brief
  overview of the things ahead of them in the queue.




                                                             40
       Benefits for the IT
         organization
We know there are many benefits of a good
 incident management process.
The following major obstacles, if not dealt
 with, will mean the process will be
 inefficient and ultimately unsuccessful.



                                              41
Successful Incident Management:

A CMDB (Configuration Management
Data Base) needs to be set up before
Incident Management is implemented.
This makes the determination of impact
and urgency a lot faster.




                                         42
 Successful Incident Management:

A knowledge database:

This database will hold Known errors, work
 around and resolutions.
This will help Incidents to be resolved much
 faster and with less effort.



                                               43
 Successful Incident Management:

An Incident Management tool to record
 and monitor Incidents easily




                                        44
           Successful Incident
             Management:

The challenge in the implementation of these
  tools and databases is not to let the work of
  setting up the system stand in the way of
  making progress.




                                                  45
         Successful Incident
           Management:

The biggest challenge facing IT professionals is
  the "discipline" it takes to use the tools and
  procedures.
The sooner the discipline of logging information
  and searching for solutions rather than re-
  working solutions can begin the better.
 Successful Incident Management:

If people start to use the tools and start to
   see benefits in doing so, then the proper
   "habits" are formed.
It is then a relatively easy task to modify
   behaviours to use a different tool or
   introduce new features/functionality as tool
   development or tool selection progresses


                                              47
IBM Global Services
    (example)



                      48
      Priority order for handling incidents is primarily
               defined by impact and urgency
 A simple priority matrix example


                       Priority 2               Priority 1
                        Limited                  Significant
                     damage, should           damage, must be
                      be recovered               recovered
                      immediately               immediately
           Urgency




                       Priority 4               Priority 3
                     Limited damage,         Significant damage,
                      does not need             does not need
                     to be recovered           to be recovered
                       immediately               immediately




                                    Impact
 49
                             Incident Management

       Each priority is related to a certain
                recovery time…
     Priority 1: Significant damage, must          1 hour
     be
      recovered immediately
                                                   2 hours
     Priority 2: Limited damage, should be
      recovered immediately
                                                   4 hours
     Priority 3: Significant damage, does
     not
      need to be recovered immediately             8 hours


     Priority 4: Limited damage, does not
     need to be recovered immediately


50
           Incident Management
       … This results in an appropriate
                 escalation
Functional versus hierarchical escalation
                                                           Managing
                                                            Board
         Information / Support
              (Escalation)




         Hierarchical          Incident Mgt.
         Escalation



                                                              Functional
                                                              Escalation

                        Transfer or integration of further knowledge carriers

51
                               Incident Management

     Escalation – Escalation Levels
     Escalation Levels
     Escalation levels ensure that for repeated
     occurrences of an escalation trigger, the
     according measures are intensified,
     increased, or changed.

     Reasons for increasing the escalation level
     can be the following:
        threatening exceeding or already expired reaction time;
        threatening exceeding or already expired recovery time;
        very high priority of the disrupted service


52
            Escalation – Example of an
                Escalation Matrix
                                             Escalation steps

Priority                           0   1     2          3       4
                                           Measures
1    very urgent and   to inform
     very important
                       measures

2    urgent and        to inform
     important
                       measures
3    urgent and not    to inform
     important
                       measures
4    not urgent and    to inform
     important
                       measures
5    not urgent and    to inform
     not important
                       measures

      53

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:7/6/2012
language:
pages:54