Caching Technologies for Web Applications_1_

Document Sample
Caching Technologies for Web Applications_1_ Powered By Docstoc
					                                                                 ®




Caching Technologies for Web Applications

 Dr. C. Mohan, IBM Fellow
 IBM Almaden Research Center, San Jose, CA 95120, USA
 mohan@almaden.ibm.com http://www.almaden.ibm.com/u/mohan/

 9th International Conference on Database Systems for Advanced
 Applications (DASFAA), Jeju Island, Korea, 17 March 2004
 Agenda

           Introduction

           Granularity of Caching and Location of Caches

           Caching Scenarios

           Fragment Caching for Dynamic, Personalized Content

           Database Caching Architectures and Issues



Mar 2004                       DASFAA, Mohan                    2
 Motivation
           World Wide WAIT!                  Happy Customers!



                                  Caching


     Caching is widely used for improving performance in many contexts
     (e.g., processor caches in hardware and buffer pools in DBMSs)

              Where and what to cache in web context?
              Many caching points and many types of objects!

  Our focus: Transactional/Database applications, not internet search, etc.
Mar 2004                         DASFAA, Mohan                                3
 e-Business Application Characteristics
      Study by IBM's Conner, Copeland, Flurry
           Large # of registered and online users
           Load dominated by reads that grow without natural limits
           More tolerance for timeliness of data
           24x365 availability: an increasingly strong goal
           e-Business application’s users are its customers
           Multi-tier architecture
           Multiple channels of access are the norm

Mar 2004                             DASFAA, Mohan                    4
 e-Business Scaling Principles

           Move interactions as close to user as possible
           Caching: key technique to reduce costs and improve
           response time
           Spread load over an expandable number of servers
           Segment load by tiers and within tiers make
           computations in a server more homogeneous
           Manageability, security and availability are critical
           factors in all design decisions


Mar 2004                          DASFAA, Mohan                    5
 Classical Web Setup
                                  User        ...      User




                                         Edge Server




                               Web Server 2                          Web Server n
           Web Server 1                                  ...


                App Server 1       App Server 2        ...     App Server m




                                         DB

Mar 2004                             DASFAA, Mohan                                  6
 What, Where, When and How
       Considerations for caching in web context are:
        What, when and where to cache
        Granularity of caching: web pages, fragments of pages,
        servlet execution result, SQL query result, data, ...
        Location of cache: client, proxy, edge-of-net, internet
        service provider (ISP), edge-of-enterprise, app server, web
        server, DBMS
        Caching and invalidation policies: application transparency,
        push vs. pull, freshness maintenance, triggers, log sniffing
        Enabling cache exploitation: routing, failover, accounting,
        authentication, authorization, ...
        Tools: performance monitoring, analysis
       Related DB Technologies: replication, materialized views,
       mediator systems, client-server DBMSs, buffer management,
       main-memory DBMSs, query optimization, content mgmt, ...
     For a comprehensive tutorial, see http://www.almaden.ibm.com/u/mohan/Caching_VLDB2001.pdf
Mar 2004                                        DASFAA, Mohan                                    7
 Points of Non-Database Caching
                                                                                      Motivation:
                         Enterprise or ISP                                            Response Time,
                                                                                      Bandwidth
                                    Forward proxy
                                       cache
                                            WSES                   Edge of internet
                       browser
                                                                      cache


                                      ND
                                                               WSES
                                                                    Internet
                                                              (exchange points)

                                    browser


                                                                         WCS/WAS/WS cache
                                                       Edge of                  WCS
                                                      enterprise
                                                        cache                   WAS
                     Motivation:                ND
                                                                   ND           WS
                     Cost-performance
                     Scalability                                                 ...          DB
                                                                                              DB
                                                                                 WCS
    DB: Database                                                                 WAS
                                                       WSES                      WS
    ND: Network Dispatcher
    WAS: WebSphere App Server
    WCS: WebSphere Commerce Suite                           Enterprise
    WS: Web Server
    WSES: WebSphere Edge Server

Mar 2004                                             DASFAA, Mohan                                     8
 Caching Landscape

    End-to-end caching involves caching in 4 logical tiers
    Includes a content distribution, replication and invalidation
    backbone
   where is it   Routers             Application Server
                                     J2EE Web Container
                                                             Application Server
                                                             J2EE EJB Container
                                                                                     Database Server
                 Web Servers
   cached?       Edge Servers
                                                             Datasources




  caching          Edge of             Presentation           Business                   Data
  segments         Enterprise/         Logic                  Logic
                   Network
   what does       HTTP Content       HTTP Responses           Command Result        In memory cache
                   (HTML/XML/GIFS)    from JSP/Servlet         Caching               Data replicas
   it cache?       HTTP Content       (HTML/XML/GIF/etc)       EJB Caching
                   Fragments          Command Results          JDBC Results
                   ESI Fragments      Web Services             Structured Data
                   Java Objects       Results                  Objects


                                          distributed replication and notification
Mar 2004                                  DASFAA, Mohan                                                9
 WebSphere Dynamic Caching Service
                                   Servlets
                                                                     Web                        WAS v5.0
       Cachespec                    JSPs                Commands                 EJBs    Dynacache
                                                                   Services
          xml                      ESI-tags
                         APIs
                                                                       •Hashtable-like data structure
                         Object Storage                        •Replacement Policy (LRU / Priorities)
            Java                                                                   •Overflow to Disk
                         Services
           Memory                                           •XML cache Policy Mgmt (ID-Generation)

                         Invalidation and                                  •Cached Data Replication
       Hashtable         Replication                                    •External cache coordination
        On Disk
                                                                              External
                                   DRS                                         Cache
                             data replication service
                                                                              Adapter
                                                                          Akamai network
                        WAS Cluster                                    IBM HTTP Server Plug-in
                    Remote Dynamic Caches                           IBM Edge Server Caching Proxy
                                                                         AFPA Kernel Cache
Mar 2004                                                  DASFAA, Mohan                                    10
  Scenarios
     Data Scenario- Commerce
           Catalog caching - Thousands of entries
           Session caching and personalization data
           Shopping cart caching in business logic tier
           Page assembly of fragments
     Presentation Scenario - On-line Trading
           Page assembly of fragments
           Application offload with cache (quotes, watch list)
           Active data caching (push/pull)
           XML/XSLT
     In Network Scenario - Portal
           Page assembly of fragments
           Web services client cache
           In Network caching using Akamai content distribution network
     Distribution and Versioning Scenario
           Upgrading to a new version of the online trading application
     Intranet Scenario
           Branch office
           Forward proxy

Mar 2004                                       DASFAA, Mohan              11
 HTTP Caching


              browser browser                             server    web
                       cache      proxy         request   cache    server
                                  cache
                                            response




           Multiple caches - between browser and server
           HTTP headers control
             whether or not a page can be cached
             cache expiration time (Time To Live - TTL)
           Full pages and images can be cached
             unable to cache HTML fragments

Mar 2004                        DASFAA, Mohan                               12
 Dynamically-Generated Pages


   Increased generation due to
      Database-centric e-commerce apps
      Frequently updated content
      Personalization
      Access device differences
   Proxy caching is ineffective for such pages




Mar 2004                     DASFAA, Mohan       13
 Caching HTML Fragments

     "When part of a page is too volatile to cache, the rest of the page can still be cached."


                                          product display page JSP
                                       (externally requested fragment)

                                                                                              dynamic
                                                                                               content
                                  product
                                                                 abbreviated       ad
                                                                                             fragments
                   product         detail       personalized                                (HTML/XML)
                                                                  shopping       gif URL
                    gif url       display         greeting
                                                                  cart JSP      command
                                 command



                         product                     shopper      shopping                  dynamic
                           data                        data        cart data                content
                        command                     command       command                     data

                                                                                             content
                        per week              per minute       per session     continuous    update
                                                                                               rate
                  legend:     cached     uncached
Mar 2004                                             DASFAA, Mohan                                       14
 Fragment Caching Goals
           Achieve benefits of cache for personalized pages
             Improved price/performance
             Improved response time latency
           Reduce cache storage requirements
             By sharing common fragments among multiple
             pages
           Support contracts & member groups in commerce
           apps
           Move cache into the network to multiply above benefits



Mar 2004                          DASFAA, Mohan                     15
 Enterprise JavaBeans (EJBs) and Caching
       Container can select from 3 commit-time options:
           Option A: Container caches “ready” instance between
           transactions. Container ensures instance has exclusive
           access object state in persistent store.Therefore,
           Container doesn't have to synchronize instance’s state
           from persistent store at start of next transaction.
           Option B: Container caches “ready” instance between
           transactions. Container doesn't ensure that instance has
           exclusive access object state in persistent store.
           Therefore, Container must synchronize instance’s state
           from persistent store at start of next transaction.
           Option C: Container doesn't cache a “ready” instance
           between transactions. Container returns instance to pool
           of available instances after a transaction completes.
Mar 2004                          DASFAA, Mohan                       16
 Database Caching
           Corporate data: backbone of eCommerce apps
            Static data cached at app level (e.g., catalogs in
            WebSphere Commerce Server)
           Current Strategies
            App-aware caching model - OK for app-specific data
            (web pages, images, etc.)
            Replication: cannot easily track usage patterns, not
            dynamic enough to adapt to changing access patterns
           Caching of database data
            Scalability: offload work from backend database server
            Reduced response time: caching at an edge server
            Reduced cost of ownership (e.g., use less reliable,
            cheaper machines for caching)
            Improved thruput, congestion control, availability,
            quality of service

Mar 2004                             DASFAA, Mohan                   17
Web Setup with Data Caching
                              User        ...      User




                                     Edge Server




                           http Server 2                          http Server n
           http Server 1                             ...


                    AS 1             AS 2          ...         AS m
               DBCache       DBCache                       DBCache



                                                      Goal: Help improve scalability and
                                                       performance of e-business apps
                                     DB
Mar 2004                       DASFAA, Mohan                                               18
Web Setup with Integrating Data Cache
                               User          ...    User




                                      Edge Server




                           http Server 2                           http Server n
           http Server 1                             ...


                    AS 1              AS 2          ...          AS m
               DBCache       DBCache                        DBCache




                                 DB2                       IDS

Mar 2004                         DASFAA, Mohan                                     19
Web Setup with Mid-Tier and Edge Data Cache
                               User      ...        User




             Edge Server              Edge Server               Edge Server

               DBCache                 DBCache                      DBCache


                           http Server 2                            http Server n
           http Server 1                             ...


                    AS 1              AS 2          ...         AS m
               DBCache       DBCache                        DBCache




                                 DB2                       Oracle
Mar 2004                        DASFAA, Mohan                                       20
 Cache Data Model Requirements

           Application's SQL shouldn't have to change
           Application's DB schema shouldn't have to change
           Support failover of nodes
           Support reasonable update semantics
           Support dynamic addition/deletion of app server nodes
           Limits on update propagation latencies




Mar 2004                        DASFAA, Mohan                      21
 Cache Data Model Choices
     Cloned: Each table is identical to a backend table
       Pros
         DDL definition is easy
         Every query can be satisfied anywhere
       Cons
         Updates need to reach multiple nodes
         Adding nodes involves copying lot of data
     Subset: Each table is a proper subset of a backend table
       Pros
         Updates can be handled in minimal sites
         Performance: Smaller DBs in cache nodes
       Cons
         Complex routing: integrate with edge server
         Complexity in DDL spec and query processing
         Complex update logic - know who owns records being changed

Mar 2004                                 DASFAA, Mohan                22
   Why Dynamic DB Caching?

           When full-table caching is not an option
              Limited resources to cache large tables
              High maintenance costs

           Unable to predict what to cache up-front due to changing access
           patterns
              An adaptive solution is vital for higher hit ratio and hence better
              performance - similar to traditional buffer pool management
              An infrastructure is needed to cache “hot items”

           Unable cope with the task of content specification for thousands of
           nodes
              Low cache admin cost is important


Mar 2004                              DASFAA, Mohan                                 23
 Dynamic Cache Model
           Defining what is to be cached
              Example: Caching data based on customer type

                                               Cache Database
              Cache Key
              custType                                                  Customer
                             Customer
 Example DDL:                                  On Demand Loading
 ALTER TABLE customer
 ADD CACHE KEY(custType)          cid
                                              Product                   Backend
                               Order
                                                                        Database
                                   oid           pid
                                                          Referential
                                    Orderline
           Cache Group                                  Cache Relation
                                                   Example DDL:
                                                   ALTER TABLE orderline ADD CACHE
                                                   REFERENCE FROM (pid) TO product(pid)
Mar 2004                                DASFAA, Mohan                                     24
 Query Routing for Dynamic Cache Tables: Janus Plan

            Probe Query       Switch
             (Generated      Condition      Switch
           from the query                   Union
             predicates)


                      Local query
                                                        Remote query
                       involving            Input         involving
                      Cache Tables          Query           ONLY
                           &
                                                         Nicknames
                       Nicknames
                                         Using Cache
                                         Constraints
           At runtime:
           (1) Execute the Probe Query first
           (2) Depending on the result, execute one of the legs
Mar 2004                             DASFAA, Mohan                     25
 Cache Refresh

           Automatic
             time-driven
             immediate (synchronous, asynchronous)
           On-demand
           Refresh brings new content or invalidates cached content
           Mutual consistency of related data (transaction guarantee)




Mar 2004                           DASFAA, Mohan                        26
 Updates - Push Vs Pull
             cache      source           cache     source


              Push                          Pull
           Push advantages
             reduced response time for first hit
             overwrite: less total path length than invalidation + pull
           Pull advantages
             everything doesn't have to fit in cache
               not in cache until accessed
               only hottest stay in cache long term
             personalized info cached where needed, not everywhere
             easy to get context required to execute Java ServerPage
             (JSP) - a real request will be underway when executing a
             JSP
Mar 2004                                  DASFAA, Mohan                   27
 Update Handling
           Where to perform the updates first
             Cache only
             Backend only
             Both places (2-phase commit - cost, availability issues)
           Asynch propagation if update performed initially in only
           one place
           Semantic problems
             Users not seeing their own updates
             Update replay on other copies (e.g., on logical redo if
             identity columns are involved)


Mar 2004                           DASFAA, Mohan                        28
   Population/Maintenance: Declarative Cache Tables
                    Considers:                                                        Backend
                * Currency Settings
                * Query Type                                                          Database
                * CT Predicate

                                           NN.Account
                                                              Remote Query            Account
                                                           (IUD Query, DDLs)

 Select deposit                             Nickname
 From Account           Query Router
 Where                 (MQT Matching)                                                       Changes
  branch=‘San Jose’
                                             Account
                                                                     Apply             Capture
                                        Cache
                       Cache DB                                     Program            Program
                                        Table             Changes

                                        DDL to create
                                                                               DPropR Utility
                                        Cache Table
                                                               Replication
                  CREATE CACHE TABLE accounts AS              Subscriptions
                     (SELECT * FROM nn.accounts
                      WHERE branch = ‘San Jose’);

Mar 2004                                  DASFAA, Mohan                                               29
   Population/Maintenance: Dynamic Cache Tables
                                                             Updates from
                                                             Other Sources

       Input                               Remote Query
       Query                             (By federated DB)
                       DB2 Instance                           Backend DB
      Result            (DBCache)

                                                                     Changes
       Cache Keys                Insert / Delete
      Used in Select      MQ       Statements                   Capture
        Queries                                                 Program
                                                   MQ
                         Cache
                        Daemon
                                  Invalidation                  Generates
                                   Messages                    Invalidation
                                                                Messages
Mar 2004                              DASFAA, Mohan                            30
 Data Caching Choices

           If cache in DB process, app server will incur costs of
             process boundary crossing
             data conversions
           Alternative: cache data in app server process in
             relational form (JDBC query results) for servlets,
             session/entity beans
             Java object form for entity beans
           App server would have to manage cache coherency,
           especially in cluster environment!



Mar 2004                             DASFAA, Mohan                  31
 Query Result Caching
           Cache could exist within or outside DBMS (e.g., integrate
           with JDBC driver)
           Results Tracking
             Individual results kept separately
             Results combined to avoid duplicate subsets
           Cache used to answer
             only repeated (exact match) queries: just return bits in
             bucket, no need for complex query engine
             any query for which answer is in cache (subset or union
             of earlier queries): need query processing capabilities
           Full exploitation of external-to-DBMS cache requires
             replication of significant DB query handling functions
             managing invalidation of results is much more difficult
             modified queries to be sent to backend DB to retrieve
             all columns and all qualifying rows
Mar 2004                          DASFAA, Mohan                         32
 Database Caching Products

           A number of companies are currently active in this space
           Not really a new area!
             OODBMSs did non-persistent caching on clients
             Some systems did caching beyond transaction commit
             by using call-back locking
           Scope being extended to persistent cache and edge of net
           Main memory technologies being exploited in some cases




Mar 2004                            DASFAA, Mohan                     33
 Related Work
           Database cache products
              Oracle, Times-Ten, NEC CachePortal, Microsoft SQLServer “Yukon”,
              Chutney Technologies, Tangosol, …


           Query result caching
              DBProxy (IBM Watson), Semantic Caching

           See my VLDB2001 tutorial for detailed comparisons

              “Caching Technologies for Web Applications”,
              http://www.almaden.ibm.com/u/mohan/Caching_VLDB2001.pdf



Mar 2004                           DASFAA, Mohan                                 34
 Challenging Open Issues
           Access control at the edge
           Standardized naming for fragments
           Session state tracking and failover support
           Synchronization directly between caches
           Cache maintenance in the presence of referential constraints
           Cache content purging algorithms
           Performance monitoring and tuning DBA tools
           Load balancing or cache-intelligent URL routing
           Describing cache contents for use in query rewrite
           More sophisticated query optimization criteria
           Efficient relational to Java object mapping
           XML data caching
           Web service result caching

Mar 2004                             DASFAA, Mohan                        35
 References

           “Tutorial: Caching Technologies for Web Applications”, C. Mohan, 27th International
           Conference on Very Large Databases (VLDB2001), Rome, September 2001.
           http://www.almaden.ibm.com/u/mohan/Caching_VLDB2001.pdf
           “Cache Tables: Paving the Way for an Adaptive Database Cache”, M. Altinel, C.
           Bornhövd, S. Krishnamurthy, C. Mohan, H. Pirahesh, B. Reinwald, Proc. 29th
           International Conference on Very Large Databases (VLDB2003), Berlin,
           September 2001.
           http://www.almaden.ibm.com/u/mohan/VLDB2003DBCachePaperS22P01.pdf
           “How WebSphere Caches Dynamic Content for High-Volume Web Sites”, IBM High-
           Volume Web Site Team, December 2002. http://www7.software.ibm.com/vadd-
           bin/ftpdl?1/vadc/wsdd/hvws/cache12_15.pdf
           Internet Caching Resource Center, http://www.caching.com/
           Brian D. Davison’s Web Caching and Content Delivery Resources, http://www.web-
           caching.com/
           “Scaling Up e-Business Applications with Caching”, M. Conner, G. Copeland, G.
           Flurry, DeveloperToolbox Magazine, August 2000.




Mar 2004                                    DASFAA, Mohan                                        36

				
DOCUMENT INFO
Shared By:
Tags: Caching
Stats:
views:136
posted:2/21/2011
language:English
pages:36
Description: Today, the rapid development of Web technology, caching has become a key technology of large sites, cache design a direct relationship between the speed of a site visit, and the acquisition of the number of servers, and even affect the user experience.