Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

webcaching_nov2006

VIEWS: 0 PAGES: 39

									How to live with low/intermittent
    bandwidth/connectivity



         Krithi Ramamritham
               IIT Bombay
         krithi@cse.iitb.ernet.in
                  Web Content
• Web sites have traditionally served static
  content

• But, dynamic content generation has come into
  vogue
  – generated on the fly by running dynamic scripts, e.g., Active
    Server Pages (ASP), Java Server Pages (JSP), Servlets
  – allows generation of different content for the same request




                                                                    2
                       Dynamic Web Pages…

                              Ad Component         Web Page
Navigation Component


                           Headline    Headline
                          Component   Component



                           Headline    Headline
                          Component   Component




                          Personalized Component




                         A News content site                  3
                  Generic Architecture



wired hosts

                      Network




                                Network
                                                sensors


mobile hosts
                                                servers


                                          Data sources
      End-hosts


                                                          4
         Coherency of Dynamic Data
• Strong coherency
   – The client and source always in sync with each other
   – Strong coherency is expensive!
• Relax strong coherency:  - coherency
   – Time domain: t - coherency
      • The client is never out of sync with the source by more
        than t time units
      • eg: Traffic data not stale by more than a minute
   – Value domain: v - coherency
      • The difference in the data values at the client and the
        source bounded by v at all times
      • eg: Only interested in temperature changes larger than
        1 degree                                                  5
                  Generic Architecture



wired host

                    Network




                                        Network
                                                        sensors



                                                        servers
                              Proxies
 mobile host
                              /caches
                                                  Data sources
      End-hosts


                                                                  6
 The Push        Server          Proxy          User
 Approach                 Push           Push




• Proxy registers the data item of interest
  and the coherency requirement with the
  server
• Server pushes interesting changes

+ Achieves Strong Consistency
+ Keeps network overhead minimum
-- Poor Scalability (has to maintain state
  and has to keep connections open)
-- Low Resiliency                                      7
                 The Pull Approach

            Server           Proxy             User
                     Pull             Push


Proxy Pulls after
   Time to Live (TTL)
   Time To next Refresh (TTR / TNR)

  + Can be implemented using the HTTP protocol
  + Stateless and hence is generally scalable with respect to state
    space and computation
  – Weak cache consistency
  – Heavy polling for stringent coherence requirement or highly
    dynamic data
  – Network overheads higher than for Push
                                                                      8
         Typical End-to-end
        Web Site Architecture

                   Web      Application
                  Server      Server      Data

                  Cluster    Cluster




                                           .
Users
                                           .
                                           .
                                           .
                                                 9
                     WS vs. AS
• Web servers
  – Do well defined and quantifiable local work
     • e.g., processing HTTP headers, serving static content

• Application servers
  – Run multi-layer programs
     • e.g., scripts involving
          calls to backends




                                                               10
     Inside the Application Layer
                      3-tier model

                                     • JSP
                      PRESENTATION
                                     • ASP



     ADDT’L                          • Servlets
    SERVICES            BUSINESS
                                     • COM+
                         LOGIC       • EJB
  • Commerce
  • Content Mgt.
  • Personalization


                         DATA        • JDBC
                       CONNECTOR     • ODBC
Databases    Legacy
            Systems
                                                  11
                        Inside the Application Layer…


                                                              Code
                                         PRESENTATION
                                                                .
                                                             Block(s)
                                                                .
                                                                .       1. JSP invokes a
                                  2. Servlet contacts CMS                  Servlet

                 ADDT’L                                       Code
                                             BUSINESS
                                                                .
                SERVICES                                     Block(s)

              • Commerce
                                              LOGIC             .
                                                                .
              • Content Mgt.
              • Personalization
 3. CMS requests data
                                             DATA           • JDBC
4. DBMS calls
   storage                                 CONNECTOR        • ODBC
   system   Databases    Legacy
                        Systems                                                     12
Performance and Scalability Issues
• Computationally-intensive logic executed at
  multiple tiers

• Cross-tier communication

• Object instantiation and cleanup processing

• External I/O calls

• Database connection pool latencies

• Content conversion and formatting
                                                13
Optimizing the Application Layer
           Traditional Means


  • Optimize each tier independently:
     – Presentation-level caches built inside application server
       processes
     – Main memory database employed over persistent DBMS
     – Persistent object storage techniques employed inside
       content management systems … and so on


                              PRESENTATION
                                             • JSP
                                             • ASP
                                                            Local cache
                                                          and optimization
                                                               code
                   ADDT’L                    • Servlets
                  SERVICES      BUSINESS
                                             • COM+
                                 LOGIC       • EJB




                                 DATA        • JDBC
                               CONNECTOR     • ODBC



                                                                       14
   Query result caching


• Many application server products
  offer this feature
  -- mitigates only local database access latency
  -- only a subset of query results may be reused
    in          page generation
  -- page fragments may not all be from
    databases




                                               15
 Middle tier database caching

• Caching database tables in main memory
    Oracle 9i Cache
    Main-memory databases, e.g., TimesTen
     -- mitigates only database access latency
     -- caching at table granularity results in poor
       cache utilization
     -- main-memory databases are difficult to
       integrate and maintain and can be expensive


                                                  16
               Page Level Caching
• Dynamically generated HTML pages are cached

  + Can completely offload work from web/app server
  – Low reusability for highly personalized web pages
  – URL may not uniquely identify a page
    -- increasing the risk of delivering incorrect pages
  – Often introduces excessive invalidations
    -- e.g., even if a single element on the page changes




                                                            17
Optimizing the Application Layer
                       Issues


 • Traditional techniques impact specific components
   within the application, but not the entire application

    – No mitigation of component-to-component interaction latencies

    – Different synchronization and invalidation policies risk
      data integrity

    – Each optimization scheme consumes programmer time
      for development and maintenance




                                                                  18
              Key ideas



• Re-use program results to eliminate redundant work


• Facilitate single-point, architecture-wide optimization

Apply to both
  programmatic objects and result fragments




                                                       19
              Optimizing the Application Layer


                                     • JSP
                      PRESENTATION
                                     • ASP
                                                        cache

     ADDT’L                          • Servlets
    SERVICES            BUSINESS
                                     • COM+       Enables the results
                         LOGIC       • EJB
  • Commerce                                      of programs to be
  • Content Mgt.                                  re-used.
  • Personalization


                         DATA        • JDBC
                       CONNECTOR     • ODBC
Databases    Legacy
            Systems
                                                                  20
                                          Usually….
                                                               Code
                                          PRESENTATION
                                                                 .
                                                              Block(s)
                                                                 .
                                                                 .
                                   2. Servlet contacts CMS

                 ADDT’L                                        Code
                                              BUSINESS
                                                                 .
                SERVICES                                      Block(s)

              • Commerce
                                               LOGIC             .
                                                                 .       1. JSP invokes a
                                                                            Servlet
              • Content Mgt.
              • Personalization
 3. CMS requests data
                                              DATA           • JDBC
4. DBMS calls
   storage                                  CONNECTOR        • ODBC
   system   Databases     Legacy
                         Systems

                     Plus, at each step there are communication delays and
                                      logic processing delays
                                                                                     21
 Novel Solution…
                                          Can store any program output, but
  Appl. Programming Interface             is most commonly an HTML fragment
                                Chutney   or a Programmatic Object.
                                  tags

                                              Real-time storage engine
                          Code
PRESENTATION
                            .
                         Block(s)
                            .
                            .                Function   Parameter(s)   Result




                          Code
  BUSINESS
   LOGIC                    .
                         Block(s)
                            .
                            .
                                          Tags trigger calls to the storage
                                          engine. When the Result of a
   DATA                • JDBC             Function with a specific Parameter
 CONNECTOR             • ODBC             set is already known (and up-to-
                                          date), the work normally necessary to
                                          produce that Result is bypassed.
                                                                                22
Code Blocks Perform Work

Page generation script
Code
block

          Write to Out   Application
                           logic

Code
block                    Database
                           calls
         Write to Out
                            HTML
             .           formatting
                             .
             .               .
             .               .




                                       23
       Code Blocks <-> Components

    Page generation script                                       Web Page
    Code                                                          Ad Component
    block

              Write to Out
                                                                Headline  Headline




                                   Navigation Component
                                                               Component Component
    Code
    block
                                                                Headline  Headline
             Write to Out                                      Component Component


                 .
                 .
                 .                                             Personalized Component




                                                          (Example: News content site)
Certain components can be cached


                                                                                         24
           DCA: Our Solution

   Page generation script
   Code
   block
                                        Request
             Start tag                               Dynamic
                                                     Content
              Application              Code Block   Accelerator
                logic                    Output




                            bypassed
Code

                              Work
               Database
block            calls


                 HTML
              formatting

             End tag
                   .
                   .
                   .




                                                                  25
          DCA in a Typical End-to-end
            Web Site Architecture

  • A single instance of the DCA serves a rack of
    application servers

  • Application servers communicate with DCA through a
    lightweight API
                             Web        Application
                            Server        Server        Data
                            Cluster      Cluster




Users
                                                       Dynamic
                                                       Content
                                                      Accelerator




                                                                    26
          Cache Management

• A critical aspect of any caching solution

• DCA supports novel cache management
  strategies:
   – Prediction-based cache replacement
   – Observation-based cache invalidation




                                              27
                      Cache Replacement
• Prediction-based replacement
   ⁻ fragments having lowest probability of                 Site Graph
     access replaced
                                                                      News
       ⁻ Least-Likely-to-be-Used (LLU)

                                                                                    Sports
   – Access probabilities based on:
       • Current user navigational
                                                                                      Hockey
          patterns over site graph
         (in the form of clickstreams)                                Schedules           Scores
       • Historical user navigational                                         Players Teams

          patterns over site graph
        (in the form of association rules)         (News, Sports, Hockey)  Schedules = 20%
                                                   (News, Sports, Hockey)  Players = 15%
                                             LLU   (News, Sports, Hockey)  Teams = 10%
                                                   (News, Sports, Hockey)  Scores = 55%




                                                                                               28
                Cache Invalidation

• DCA supports common cache invalidation techniques:
   – Time-based: Each cache element assigned a TTL
   – Event-based: Updates to the database send an invalidation
     message to the cache
   – On demand: Manual invalidation of selected elements

• DCA supports additional invalidation techniques….




                                                                 29
             Cache Invalidation…
• Other invalidation techniques supported:
   – Observation-based
      • User-initiated updates are observed in scripts; each such
        update sends an invalidation message to the cache
      • Most appropriate for auction sites, online trading sites
      • Invalidation does not require communication with the
        databases
   – Keyword-based:
      • Elements can be associated with keywords; e.g., a retailer
        may wish to invalidate all “seasonal” items
   – Regular expression-based:
      • Elements can be invalidated based on regular expression
        matching

                                                                     30
        Performance Study…



Test Site
   – Fictitious online retail site, allows browsing of product
     catalog
   – Pages generated using JSP scripts
   – Site content stored in Oracle database
   – Database schema based on Dublin Core Metadata Open
     Standard
   – Contains 200,000 products and 44,000 categories
   – Each page consists of 3 components, each involving a
     database call
                                                             31
     Performance Study…


Test Setup
  – Content Database Server:
      Oracle 8.1.6

  – Web/Application Server:
     WebLogic 6.0 running on cluster of 2 machines

  – Server machines:
     have 1 GB RAM, dual P III-933 Mhz processors
     run Windows 2K Advanced Server
                                                     32
           Testing Methodology...



•    Baseline Parameters:
    – Cache Size, i.e., percentage of fragments that fit into
       cache: 75%
    – Cache replacement policy: LLU

•    User load is varied by sending requests from client
     machines running Radview’s WebLoad

•    Simulated users navigate site according to Zipf 80-20
     distribution (i.e., 80% of users follow 20% of navigation
     links)                                                      33
           Performance Impact
80% faster response times through existing application infrastructure




                 Source: Fortune 100 client results                34
Chutney Throughput Impact
  250% increase in transaction rates




     Source: Fortune 100 client results   35
Alternative: CDNs

                 e.g., Akamai
   Sources

                      Content
                     Distribution
                      Networks
  Repositories

                    Push Based
                    Core Infrastructure


    Clients
                                          36
                     Conclusion
• Increased use of dynamic page generation technologies
  => increases load on application servers
  => serious performance and scalability problems
      for e-business sites

• DCA (Dynamic Content Acceleration)
  => significantly reduces the load on the server side
  infrastructure, allows e-business sites to scale
  => significantly outperforms existing middle tier caching
  solutions
                                                          37
  IIT Bombay’s aAQUA Community Forum

Farmers get information and
get their questions answered
-- In the local context
-- In their local language



Capitalizes on existing human and
infrastructural resources:
Agri-extension center – KVK, Baramati
NGO – Vigyan Ashram, Pabal
Government – MCIT
                                        www.aAQUA.org
    Access over low bandwidth:
      Resource Optimization
Resource constraints
   Low/unpredictable bandwidth
     => disconnected operation/access

Exploit
caching
prefetching (through prediction of future needs)
Profiling by user type, location
 =>offline aAQUA

Data characteristics
   Static data – text, images – land records, photos
       can be cached/hoarded
   Dynamic data – weather/price information
       cached info need to be refreshed carefully
   Continuous media – VoIP, video data                 39

								
To top