broker

Document Sample
broker Powered By Docstoc
					Evaluating Provider Reliability in Risk-aware
              Grid Brokering
               Iain Gourlay
Outline

   •   AssessGrid background
   •   Problem Statement
   •   Basic Reliability
   •   Analysis of behaviour
   •   Stationarity Problem
   •   Weighted Reliability
   •   Simulations and Results
   •   What if a provider is unreliable?
   •   Alternative: Bayesian Inference
   •   Summary and Conclusions

                                           2
AssessGrid Background
   • AssessGrid addresses Risk Management in the
       Grid.
   •   This is a necessity in the drive towards
       commercialisation of Grid technology…
        - The goal is to move beyond best-effort, using SLAs to
           specify agreed upon level of service. However,
       -   For resource providers, offering an SLA with service
           guarantees and penalties is a business risk!
        -  For end-users, agreeing to an SLA is a business risk!
   •   A large part of AssessGrid is concerned with
       methods to support providers with tools and methods
       to:
        - Monitor and collect useful data.
        - Assess risk associated with accepting an SLA request,
           based on this data.


                                                                   3
What is risk?

   • Risk is “Hazard, danger, exposure to mischance or
       peril” (Oxford English Dictionary).
   •   Risk Management is a discipline that addresses the
       possibility that future events may cause adverse
       events.
        - Economics, Operations Research, Engineering, Gambling,
          …
   •   In Risk Management, risk is quantified with two
       parameters:
          Risk = Probability of Occurrence x Impact

   •   Grid computing: Event is SLA failure!


                                                                   4
Scenario




           5
Role of the Broker

   •   Key role: Finding/Negotiating with providers
       on behalf of end-users.
   •   Broker can also act as an independent party:
       - Providers may have motivation to lie!
       - Providers may have unidentified problems in their
         infrastructure.
   •   Here, we assume the broker is independent
       and honest.
   •   Broker can give a second opinion on risk
       assessments.
   •   Broker can agree its own SLAs (virtual
       provider).

                                                             6
Problem statement: What do we mean by
reliability?

   •   A provider makes an SLA offer:
       -   includes an estimate of the Probability of Failure
           (PoF).
   •   Each time an offer is accepted, the details
       are stored in a database, including:
       -   Final status (Success/Fail)
       -   Offered PoF
   • The problem is:
   Given a provider’s past data, can their risk
     assessments be considered reliable?


                                                                7
What is reliable?

   • Considering only systematic errors!
   • Assume s SLAs in the database for the same
       provider.
        - Offered PoFs, pi ; i  0,..., s  1
   •                               
       Assume number of fails ~ N F ,  F    2

   •   We define a reliable provider as one that does
       not systematically underestimate or
       overestimate the PoF, so that:

              1  Fpred  F  1  Fpred


                                                        8
Is it normal?



                                          Normal vs Binomial, p=0.05, n=50

                           0.3

                          0.25
      f(nmber of fails)




                           0.2

                          0.15                                                    Binomial
                           0.1                                                    Normal

                          0.05
                             0
                                  0   5          10        15        20      25
                          -0.05
                                                number of fails




                                                                                           9
Is it normal? (2)



                                        Normal vs Binomial, p=0.05, n=100

                             0.2

                            0.15
       f(number of fails)




                             0.1
                                                                                 Binomial

                            0.05                                                 Normal


                               0
                                    0   10      20        30      40        50
                            -0.05
                                               number of fails




                                                                                          10
Basic Reliability: Identifying Systematic Errors

    Using the provider’s offered PoFs:

                 s 1
        Fpred   pi
                                          s 1


                i 0
                                   0     p 1  p 
                                          i 0
                                                 i   i



    The evaluation is based on the following
       measure:
                         Fpred  Fails
                    R
                             0



                                                         11
Basic Reliability: Identifying Systematic
Errors(2)



                       Distribution of R for Reliable Providers

                              0.45
                               0.4
                              0.35
                               0.3                           Perfect Provider
                              0.25
      f(R)




                                                             Overestimates PoF
                               0.2
                              0.15                           Underestimates PoF
                               0.1
                              0.05
                                 0
             -6   -4     -2          0   2    4      6
                                     R



                                                                                  12
Basic Reliability: Identifying Systematic
Errors(3)
    We note that
                             Fpred  F    F   
                                                     2
                                                         
                       R ~ N             ,
                                               
                                                 
                                                         
                             0            0          
                                                        

    and recall the condition, 1   Fpred  F  1   Fpred
      leading to

                         Fpred      F 
           Robserved   
                                z
                                      
                            0       0 




                                                                  13
Analysis: How does the measure behave?

   Simple Example:
   • m SLAs in database.
   • Offered PoF is constant, p.
   • There is a systematic
     overestimation/underestimation of the PoF, such
     that:

                p fail  p1   




                                                       14
Analysis (2)
                                                             0.45                                 1.1
                                                              0.4                                 1
                                                                            95.35%                0.9




                                                                                                        Cumulative Probability
                                                             0.35




                 Probability Density
                                                                                                  0.8
                                                              0.3                                 0.7
                                                             0.25                                 0.6
                                                              0.2                                 0.5
                                                             0.15                                 0.4
                                                                                                  0.3
                                                              0.1
                                                                                                  0.2
                                                             0.05                                 0.1
                                                               0                                  0
                                       -6       -4    -2            0       2        4        6




                       
  Punreliable    0  
                                               mp                 
                                                               1 p  z       
                                                                          0 
                                                                                            mp                                    
                                                                                                                                   1 p  z 
                                                                                                                                            
                       
                                                      F   0        
                                                                        
                                                                               
                                                                                                         F                    0        
                                                                                                                                            


   F  mp1   1  p1   


                                                                                                                                            15
Stationarity Problem

   •   Conditions are not static!
       -   Example: 60 red balls in a bag.
                    40 blue balls in the same bag.
       You try to estimate the number of red balls
        by taking a ball out and replacing it,
        repeating this 50 times.
       Someone is secretly removing a red ball and
        replacing it with a blue after every sample.
       E(red) =17.5
       Number of reds =10!


                                                       16
Stationarity Problem(2)

   A provider’s behaviour could change as a
     consequence of a variety of factors, e.g.

   •   A provider’s infrastructure is updated.

   •   A provider’s risk assessment methodology or
       model parameterisation may change.

   •   A provider’s policy may change, for example
       due to economic considerations.

                                                     17
Weighted Reliability
   • Use a weighted average, ensuring more recent
      SLAs have a larger influence.

   • Total of mk SLAs are split into k categories, with
      the kth consisting of the most recent SLAs.
                            k

                           R         i   mi
                  RW      i 1
                                  k

                                j 1
                                             j


   Here, Rm i  is the basic measure R over the ith
     category.

                       i  i  1
                                                          18
Simulations

   •   A database of SLAs is generated:
       -   Each SLA object has an offered PoF, true Pof and
           final status.
   •   Reliability computed.
   •   Process repeated 10000 times for each
       scenario.
   •   Simple case considered here:
       - Offered PoF is fixed and true PoF is fixed.




                                                              19
Results



                                               "Perfect" Provider, PoF=0.05, alpha=0.1, z=2

                                   0.025
          Prob(found unreliable)




                                    0.02

                                   0.015                                                      Simulation
                                    0.01                                                      Analytic

                                   0.005

                                      0
                                           0     1000   2000   3000   4000   5000   6000
                                                          Number of SLAs




                                                                                                           20
Results(2)


                                   Unreliable Provider, offered PoF=0.05, true PoF=0.07,
                                                      alpha=0.1, z=2

                             1.2
    Prob(found unreliable)




                              1
                             0.8
                                                                                       Simulation
                             0.6
                                                                                       Analytic
                             0.4
                             0.2
                              0
                                   0    1000   2000   3000    4000   5000   6000
                                                 Number of SLAs




                                                                                                    21
Results (3)


                                         Unreliable Provider, offered PoF=0.05, true
                                                  PoF=0.03,alpha=0.1, z=2
     Prob (found unreliable)




                               1.2
                                1
                               0.8                                                     Weighted
                               0.6                                                     Basic
                               0.4                                                     Moving Average
                               0.2
                                0
                                     0   1000   2000   3000   4000   5000   6000
                                                 Number of SLAs




                                                                                                        22
Results(4)


                                  Provider unreliable>500 jobs: truePof=2*offeredPof(=0.05)

                        1.2
                         1
     Prob(unreliable)




                        0.8                                                                   Basic
                        0.6                                                                   Moving Average
                        0.4                                                                   Weighted

                        0.2
                         0
                              0          500         1000          1500         2000
                                                 Number of Jobs




                                                                                                          23
Results (5)


                                 Provider unreliable>500 jobs: truePof=1.5*offeredPof(=0.05)

                       1.2
                        1
    Prob(unreliable)




                       0.8                                                                 Basic
                       0.6                                                                 Moving Average
                       0.4                                                                 Weighted

                       0.2
                        0
                             0     500    1000    1500   2000     2500   3000   3500
                                                 Number of Jobs




                                                                                                       24
What if the provider is unreliable?


   •   Discrete approximation: When SLA Offer
       received with offered POF of p, estimate
       POF by looking at failure rate for all SLAs
       with offered POF of ~p.
       Then,
       If (|reliability measure| < threshold) Believe provider.
       Else(PoF estimate =
          numFails(POF~p)/numSLAs(POF~p)
   •   Use all SLAs with offered PoF within x% of
       the offered PoF in the current SLA.

                                                                  25
Weighted Average risk assessment

   • Split km SLAs into k categories.
   • Compute the estimate PoF, pi for each category,
     i=0,…,k-1.

                       k 1

                              i   pi
               pbr    i 0
                         k 1

                        
                        j 0
                                    k




                                                       26
Never Trust Doctors

   •   You are tested for a disease, which 2% of the
       population has.
   •   The test never gives a false-negative.
   •   If you are clear, there is still a 5% chance of
       a false positive.
   •   You test positive.
   •   What is the probability you have the
       disease?




                                                         27
Alternative Approach: Bayesian Inference

    •   The provider offers a linguistic risk
        assessment, e.g. the failure probability is:
        - “extremely low”    : <1%
        - “very low”         : 1-5%
        - “low”              : 5-10%
        - “medium”           : 10-20%
        - “high”             : 20-30%
        - “very high”        :30-50%
        - “extremely high”   : >50%
    •   If the broker/end-user requests the PoF
        exact value this can be provided.

                                                       28
Alternative Approach: Bayesian Inference (2)

    •   The broker does not consider the provider’s
        reliability directly. Instead it takes the
        following approach:
        -   Having received a linguistic risk assessment for
            a new SLA, the broker first computes a prior
            distribution for the PoF, given the linguistic
            category by considering data across all other
            providers.
        -   The broker computes a posterior distribution,
            based on the failure rate observed in past SLAs
            from the same provider with the same linguistic
            risk assessment.
        -   The broker returns an object which contains:
    •   (PoF_broker, confidence)
                                                               29
Alternative Approach: Bayesian Inference (3)




        E p 
                  1         1  vi  vcurr 
                          N 1

                        2  s  s 
                N  1 i1        i   curr




                                                30
Summary/Conclusions

   •   A detailed analysis has been carried out for a
       method to identify providers who are
       systematically unreliable.
   •   The stationarity problem has been
       addressed.
       -   Weighted Average
       -   Results indicate good performance relative to
           basic measure and moving average.
   •   This can be extended to other measures for
       “non-systematic” errors.
   •   Bayesian approach has been considered and
       is also promising.

                                                           31

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:8
posted:10/3/2012
language:Unknown
pages:31