Docstoc

acme-wdas

Document Sample
acme-wdas Powered By Docstoc
					   Who is more adaptive?
         A.C.M.E.
(Adaptive Caching using Multiple Experts)

              WDAS, March 2002
             Ismail Ari, Ahmed Amer
  Ethan L. Miller, Scott Brandt, Darrell D. E. Long

                         Ismail Ari
                 Univ. California Santa Cruz
                     Introduction
• We describe an adaptive caching technique that can find the
  current best policy or mixture of policies for a workload
• Requires no manual tuning




• Allows exclusive caching without message exchanges
• Scalable distributed caching clusters can be formed
• Scalable distributed data access achieved


                               Ismail Ari
                       Univ. California Santa Cruz
       Motivations for Proxy Caching
• Exponentially growing number of clients on the Internet
• These clients (We) access data/objects distributed all over
  the world
• Everybody wants fast response times
• Latency:
   – some objects are far away,
   – some objects are very popular and create hot spots of networks and
     servers
• Distributed caches bring data closer to the clients and
  enable data sharing
   – They reduce latency, network load and server load


                                  Ismail Ari
                          Univ. California Santa Cruz
         Summarizing Proverb




   “With twenty five years of Internet
experience we’ve learned one way to deal
   with exponential growth-Caching.”
                       - Van Jacobson



                      Ismail Ari
              Univ. California Santa Cruz
            Problems with Caching
• Cache sizes are dwarfed when compared to the
  unique document space accessed by clients
• Only the most “valuable” objects are to be cached
• What defines the most valuable?

• Different cache replacement policies have
  different values or priorities
   – They combine multiple criteria (recency of access,
     popularity) into a Priority Key that describes their
     ordering of objects

                               Ismail Ari
                       Univ. California Santa Cruz
 Which Criteria or Policy to use?
       Criteria                                          Policy
           -                             Random, FIFO, LIFO
         Time                    LRU, MRU, GDS, GDSF, LRV
     Frequency                         LFU, MFU, GDSF, LRV
         Size                                    GDSF, LRV
   Retrieval Cost                                GDSF, LRV




FIFO: First in First Out             GDSF: Greedy Dual Size with Frequency
                                     [Cao97,Arlitt99,Jin00]
LRU: Least Recently Used
                                     LRV: Lowest Relative Value [Rizzo97]
LFU: Least Freq Used

                                   Ismail Ari
                           Univ. California Santa Cruz
 Which Criteria or Policy to use?
       Criteria                                          Policy
           -                             Random, FIFO, LIFO
         Time                    LRU, MRU, GDS, GDSF, LRV
     Frequency                         LFU, MFU, GDSF, LRV
         Size                                    GDSF, LRV
   Retrieval Cost                                GDSF, LRV
          ID                               Hash, Bloom-Filters
     Hop-count                                             -
    QoS priority                                    stor-serv

FIFO: First in First Out             GDSF: Greedy Dual Size with Frequency
                                     [Cao97,Arlitt99,Jin00]
LRU: Least Recently Used
                                     LRV: Lowest Relative Value [Rizzo97]
LFU: Least Freq Used

                                   Ismail Ari
                           Univ. California Santa Cruz
      Which Criteria or Policy to use?
• Policies are statically embedded in systems a-priori
   – But we don’t know where and how the system will be used.
• Their performance is workload dependent
   – Some policies are better than others in certain workload
   – Request streams may have sub-streams that favor other
     policies
• As the workload changes over time (hour, day, year)
  the performance of static policies degrade
• As the network topology changes the workload changes



                               Ismail Ari
                       Univ. California Santa Cruz
             Solution: Be adaptive
• Systems and workloads are complex and are under
  continuous change
• Manual tuning and monitoring is tedious, if at all
  possible.
• Systems that adapt by message and database
  exchanges are not scalable.
• Choose all policies !!
• Automatically adjusts to the best mixture of
  policies that current conditions require
   – How to mix cache policies?

                             Ismail Ari
                     Univ. California Santa Cruz
   How to mix? Biological Motivations
• Imagine all cache policies as species competing
  for food (documents or objects) in habitat (cache)
• The fitness of a specie is based on how well it eats
   – Fitness of a policy is its Hit Rate or Byte-Hit Rate
• The population share (or frequency) of a specie
  depends on its fitness
• Highly fit species may starve the others and if
  conditions change the whole system collapses



                               Ismail Ari
                       Univ. California Santa Cruz
   How to mix? Biological Motivations
• Predators (probabilistically) prey on the most frequent (or
  easy) species
• Predators protect diversity (mixing) among species
   – By avoiding the most fit specie from starving the others
• Our predator implementation (resource manager) both
  assigns the objects to caches and manages cache spaces
   – Problem: Cannot allow duplicationsassign probabilistically
   – Problem: Lucky draws (unfairness) can cause bad policies to
     gain fitness
                               Predator




                                 Ismail Ari
                         Univ. California Santa Cruz
         How to mix? Virtual Caches
• We define a pool of virtual caches each simulating a single
  cache policy and object ordering
• Virtual caches act as if they have the whole cache, but they
  only keep object header information; not the actual data




                                Ismail Ari
                        Univ. California Santa Cruz
                 Weighted Experts
• In Machine learning terminology “experts” are algorithms
  (e.g. LRU) that make predictions ; denoted by vector (xt)
• Weights of experts (wt) represent their the quality of
  predictions
• The Master Algorithm predicts with a weighted average of
  the experts’ predictions
       ýt = wt . xt
• Depending on the true outcome (yt) (hit/miss) we incur
  loss and later update weights
       e.g. Loss( ýt, yt ) = (1-0)2 = 1

                               Ismail Ari
                       Univ. California Santa Cruz
 Fitting Caching into Expert Framework
• In our paper we describe:
   – A pool of virtual cache policies voting for objects
   – The highest vote objects stay in the real cache
   – After the hits/misses their weights are updated proportional to their vote

• In current implementation:
   – Virtual caches tell whether they had hits or misses
   – This is their prediction or vote
   – Their predictions are compared to the true outcome
• Virtual policies that predict workload well are rewarded
  with a weight increase or vice versa
• Real cache looks like the virtual cache with the highest
  weight, but is still a mixture of multiple policies

                                      Ismail Ari
                              Univ. California Santa Cruz
    Weight Updates of Virtual Caches
• Discrete Loss




• Size based loss (Not all misses are equal!)
   – e.g. f(object size) = log(size)

                                Ismail Ari
                        Univ. California Santa Cruz
            Machine Learning Algorithms

• Loss Update (Weighted Majority Algorithm [LW89])


      wt+1,i :=      wt,i . e –Lt,i            for i = 1..n
                     Normalization ( wt,i )

     where
      (eta) = the learning rate
     W0 = ( 1/n, 1/n, … , 1/n)  weights initialized equally




                               Ismail Ari
                       Univ. California Santa Cruz
        Share Update [Herbster&Warmuth95]
• Loss Update learns too fast, but does not recover fast
  enough (e –Lt,I = e –1  1/2.7)
   – The curse of the multiplicative updates (M.Warmuth)
• We must make sure weights don’t become too small.

• A pool is created from the shared weights
   – The algorithms are forced to share in proportion to their loss
• Weights are redistributed from the pool to make sure all
  policies have some minimal weight ( wt+1,i )




                                   Ismail Ari
                           Univ. California Santa Cruz
ACME Design




          Ismail Ari
  Univ. California Santa Cruz
 Adaptive Policy vs. Fixed Policies
•Synthetic workload switches its nature to favor SIZE over
LRU at 500sec.
•Adaptive policy can switch experts
               1

              0.9

              0.8

              0.7

              0.6
  Hit Ratio




              0.5

              0.4

              0.3

              0.2

              0.1

               0
                    100   200   300   400   500   600    700    800   900 1000 1100 1200 1300 1400

                                                        Time (sec)

                                              LRU        SIZE         ADAPTIVE

                                                    Ismail Ari
                                            Univ. California Santa Cruz
                NLANR-RTP Proxy Trace Results
• Adaptive policy chooses to stay with best fixed policy (GDSF)
and is just as good



                60

                50
                                                                   LRU
                40
 Hit Rate (%)




                                                                   LFU
                                                                   RAND
                30
                                                                   GDSF
                                                                   LFUDA
                20
                                                                   ADAPTIVE
                10

                 0
                     0   50000        100000           150000
                                          Ismail Ari
                                 Request Number
                                     Univ. California Santa Cruz
        Weights of Virtual Caches
• GDSF was better than the other fixed policies most of the time
• There was still a little bit MIXING




                                Ismail Ari
                        Univ. California Santa Cruz
                   Current Work
• Trying different adaptive mechanisms
   – Fixed Share, Variable Share, Game Heuristics
• Using different sets of policies
   – Currently implemented 12 algorithms
      • LRU, MRU, FIFO, LIFO, LFU, MFU,
      • RAND, LSIZE, GDS, GDSF, LFUDA, GD*
• With different workloads
   – Web proxy, File System traces, Synthetic
• With different topologies
   – N-Level caches, UCSC Network Topology
• With different memory sizes

                              Ismail Ari
                      Univ. California Santa Cruz
                     Conclusions [1]
• Today’s systems are complex and have thousands of
  configuration parameters
• Real life scenarios are dynamic;
• With Static policies
   – Either do continuous manual tuning and monitoring
   – OR get poor performance


• We tried to manually select heterogeneous policies for a
  2-level cache
   – It was impossible to choose an exact policy for the second level
   – Overall performance of unexpected pairs could be just as good


                                   Ismail Ari
                           Univ. California Santa Cruz
                     Conclusions [2]
• Adaptive Caching using Multiple Experts:
   – Automatically switches policies or selects a mixture of policies to
     track the nature of the workload
   – No manual tuning or assumptions about the workload
• Performance will be at least as good as the best fixed
  policy or better if there mixing in the workload
• If all caches adaptively tune to the workload they observe,
  they would not need to exchange control messages or
  summarized databases (e.g. for exclusive caching)
• This system is very scalable and allows construction of
  globally distributed caching clusters


                                   Ismail Ari
                           Univ. California Santa Cruz
                        Thank You
• Machine Learning Group in Santa Cruz
  – Manfred Warmuth

• Game Theory Group
   – Robert Gramacy, Jonathan Panttaja, Clayton Bjorland


• Storage Systems Research Center (SSRC), UCSC

• Storage Technologies Dept. (STD), HP Labs, Palo Alto

• CERIA and WDAS Committee


                                 Ismail Ari
                         Univ. California Santa Cruz
                               References
•   [Arlitt99] M. Arlitt et.al. , Evaluating Content ManagementTechniques for Web Proxy
    Caches, Proceedings of the 2nd Workshop on Internet Server Performance WISP'99
•   [Bousquet95] O. Bousquet, M. K. Warmuth. Tracking a small set of experts by Mixing
    Past Posteriors
•   [Cao97] Pei Cao, Cost-Aware WWW Proxy Caching Algorithms, USENIX SIST97
•   [Jin00] S. Jin and A. Bestavros, GreedyDual* Web Caching Algorithm: Exploiting the
    two sources of temporal locality in web request stream, Proceedings of the 5th
    International Web Caching and Content Delivery Workshop, 2000
•   [LW92] N. Littlestone, M. Warmuth The Weighted Majority algorithm, UCSC-CRL-91-
    28, Revised October,26,1992
•   [Rizzo97] L. Rizzo and L. Vicisano,Replacement policies for a proxy cache,IEEE/ACM
    transactions on networking 2000, V8-2, pp158-170




                                         Ismail Ari
                                 Univ. California Santa Cruz
BACKUP SLIDES




           Ismail Ari
   Univ. California Santa Cruz
                             Future Work
• Workload Characterization
                                                               time

                                                               frequency
            LRU         LFU
                                  GDSF
                                                                size
           FIFO
                          SIZE


    Unique Documents held in limited cache space




                                         Ismail Ari
                                 Univ. California Santa Cruz
                              Future Work
• Workload Characterization
                                                                time

                                                                frequency
             LRU         LFU
                                   GDSF
                                                                 size
            FIFO
                           SIZE


     Unique Documents held in limited cache space


                              Something like:
  “My workload is of nature: 30% LRU, 40% size, 20% LFU”

                                          Ismail Ari
                                  Univ. California Santa Cruz
                               Future Work
• Synthetic Workload Generation
• Adaptivity Benchmarks
                                                                 time

                                                                 frequency
              LRU         LFU
                                    GDSF
                                                                  size
             FIFO
                            SIZE


      Unique Documents held in limited cache space

  1- My workload is of nature: 30% LRU, 40% size, 20% LFU
  2- Use “stack distance” metric to form request subsequences
  3- Merge subsequences into synthetic workload

                                           Ismail Ari
                                   Univ. California Santa Cruz
          Cases for Adaptive Caching

• Where to use adaptive caching
   – System Memory and Cooperative Caches
   – Scalable OBSD clusters
   – Storage Embedded Network (SEN) Clusters




                            Ismail Ari
                    Univ. California Santa Cruz
                    SEN Device
• A router with embedded volatile and non-volatile
  storage to be used for object caching
   – via object snooping in trusted routers
   – reduces client response time, network B/W, server load
   – globally scalable networked-caching clusters




                              Ismail Ari
                      Univ. California Santa Cruz
                            A case for Adaptive Caching
              OC12 to Stanford    10Mbps Berkeley (backup)
                                                                • SEN vs. Hierarchical Proxies
                        atm-sw                                  • UCSC network topology
                                                                • Parameters Changed
           OC3

           FrontDoor                   uc-net
                          OC3                                          –      workload
                                                                       –      total amount of memory
                        central
                                                                       –      link speeds
                                                                       –      departmental correlations
                                                                       –      replacement policies
       east                              west
                                                                • Metrics Measured
                                                                       – hit rates and byte hit rates
sciences
                                           SOE
                                                                       – mean response times
                                                                       – server load reductions
                                            CSE

                                                        Ismail Ari
                                                Univ. California Santa Cruz
             Salient Design Features
• Globally Unique Object Identification (GUOID)
• Ad-hoc multicast support
• Backwards compatible
• Operation
   –   SEN nodes cache all bypassing data objects
   –   Clients request <GUOID,offset>
   –   SEN node sends a local copy if it exists
   –   Forwards otherwise




                                Ismail Ari
                        Univ. California Santa Cruz
        Fixed Share Update [Bousquet95]

 pool =  [1- (1-  ) loss(i)]. wt+1,i
      (1..n)

 wt+1,i = (1-  ) loss(i). wt+1,i +
           1/(n-1).(pool – [1- (1-  ) loss(i)]. wt+1,i)


where

          = the share rate and   [0,1)

w0         = ( 1/n, 1/n, … , 1/n)




                                 Ismail Ari
                         Univ. California Santa Cruz
                   Mixing Update [Bousquet95]
• Mixing Update
            wq,i   =    t+1,q . wq m, where t+1 =1
                             q: 0 t

• Mixing Schemes
 t+1 (q)                      t+1 (q)                        t+1 (q)

                       1-                               1-                          1-
                                                                         /(t-q)
                                         /t


0 1 2              t-1 t      0 1 2                  t-1 t     0 1 2                t-1 t
     FS to Start Vector          FS to Uniform Past             FS to Decaying Past


                                       Ismail Ari
                               Univ. California Santa Cruz
Another View: Weighted Virtual Caches




                      Ismail Ari
              Univ. California Santa Cruz

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:9/20/2012
language:English
pages:37