Docstoc

Characterizing Web Workload of Mobile Clients

Document Sample
Characterizing Web Workload of Mobile Clients Powered By Docstoc
					Characterizing Web Workload of
        Mobile Clients

           Chuang Yu
           Juha Raitio




            HUT T-110.456        2005/2/23
                                                                                        2
                                                Outline
    • Web workload analyses
       • What
       • Why
       • How

    • Characteristics of workload
       • Wireline
       • Wireless

    • Case study results
    • Statistical characteristics of Web workload
       • Power laws
       • Self-similarity

    • Examples of workload analyses tools
    • Summary




Characterizing Web Workload of Mobile Clients   HUT T-110.456   Yu & Raitio 2005/2/23
                                                                                          3
                                                What?
    • Content Analysis
    • User behavior analysis
       • User load distribution
       • Session duration
       • Temporal stability
       • Spatial locality

    • System load analysis


    • How do users come to visit the web site?
    • Why do users leave the web site?
    • What contents are users interested in?
    • How do users’ interest vary in time?
    • How do users’ interest vary across different geographic region?



Characterizing Web Workload of Mobile Clients   HUT T-110.456     Yu & Raitio 2005/2/23
                                                                                              4
                                                Why?
    • Characteristics of user load have significant implications on
       • Web site design
       • Content management
       • Protocol design
       • Capacity planning

    • Content provider: Enhance user experience through more effective
      design and content management
    • Service provider: Efficient resource allocation, capacity planning, and
      pricing
    • System designer: Shed light on performance bottlenecks and
      effectiveness of protocols




Characterizing Web Workload of Mobile Clients   HUT T-110.456         Yu & Raitio 2005/2/23
                                                                                            5
                                                How?
    • Gathering requirements, what are the goals of the analysis?
    • Planning and design the data collection
       • What data to collect?
       • Over how long period of time?
       • From where? Web proxies, Web browsers and Web servers
       • What is the scope? How large? How many?
       • What methods to use? What analysis needed? How to analyze
         data?
    • Collecting data
    • Analysis the traces with statistic and mathematics approaches
    • Execute different analysis
       • Content analysis
       • User behavior analysis
       • System load analysis




Characterizing Web Workload of Mobile Clients   HUT T-110.456       Yu & Raitio 2005/2/23
          Wireline user workload characterization                                                   6


                            (1)
                                                Content analysis
    • Content type
       • Pure text
       • Graphics-rich multimedia
       • Majority mix of both
    • Content size
       • Size of all contents in a web server
       • Size of content that is transferred by a web server
       • Nonnegligible fraction of files are very large
       • Median transfer size ~2kB, Median content size a few hundred bytes
         larger
    • Content popularity
       • Highly depends on where traces are collected
    • Content Modification Pattern
       • Large variation in modification pattern, lots of contents never modified,
         some were modified at least once between two consecutive accesses.
       • Content type dependent, e.g. news web site
       • Most file modifications are small
       • Past modification interval, gives a rough prediction about its future
         modification time


Characterizing Web Workload of Mobile Clients      HUT T-110.456            Yu & Raitio 2005/2/23
          Wireline user workload characterization                                                        7


                            (2)
                                                User behavior analysis

              •   User Request Arrival and duration
                    • Occur at three levels: session, click and request
                    • User dependent
                    • The number of clicks in a session, the number of embedded images
                      in a web page, think time, and active time can be modeled with
                      Pareto distributions with heavy tails.
                    • 8 second rule
              •   Temporal locality and stability
                    • A page is accessed now, what is the likelihood it will be accessed
                      again in the near future?
                    • Stronger temporal locality implies caching would be effective
                    • Access ranking stability, stability is high on the scale of days
              •   Spatial locality
                    • Capture how likely people in the same geographic location or at the
                      same organization request similar set of document
                    • Effectiveness of proxy caching
                    • Organization and domain membership is significant
                    • “hot” event dominant the membership


Characterizing Web Workload of Mobile Clients         HUT T-110.456              Yu & Raitio 2005/2/23
          Wireline user workload characterization                                              8


                            (3)
                                                System load analysis

    • Load varies with time and recent event, e.g. World Cup, Sept 11….
    • Self-similar web traffic




Characterizing Web Workload of Mobile Clients        HUT T-110.456     Yu & Raitio 2005/2/23
                                                                                             9
         Wireless user workload characterization
    • WAP traffic
       • Access rate is still low, 80,000 entries in 7 months (99)
       • Amount of data is less than voice

    • Metropolitan wireless network
       • Usage behavior shows diurnal and weekly pattern
       • Users do not move frequently

    • WLAN
       • In campus, session-oriented and chat-oriented, incoming traffic
         exceeds outgoing traffic; high degree roaming within sessions,
         sessions are short normally
       • Conference, users are evenly distributed across AP;Web and
         SSH account 64% traffic; short session, 60% less than 10 min;
         bandwidth distribution is highly uneven across AP
       • Corporate, different user impose different load;




Characterizing Web Workload of Mobile Clients   HUT T-110.456        Yu & Raitio 2005/2/23
                                                                                           10
                                                Case study
    • ”A popular commercial Web site designed for Mobile clients”
        • Provides Web access for wireline, wireless and offline use
        • Provides notification services


    • Analyses
       • Web access
       • Notifications
       • Comparison between Web access and notications use
       • Comparison between wireline and wireless use


    • Motivation
       • To give an general overview the analyses process and data
       • To show some more concrete results
       • To illustrate possibilities of the analyses
       • To propose direct implications of results




Characterizing Web Workload of Mobile Clients     HUT T-110.456    Yu & Raitio 2005/2/23
                                                                                          11
                                      Case study - architecture




Characterizing Web Workload of Mobile Clients   HUT T-110.456     Yu & Raitio 2005/2/23
                                                                                                12
                                                Case study - material
    • Web access logs
       • for 12 days (August 2000)
       • per user
       • per request

    • Notification logs
       • for 6 days
       • per user
       • per notification


    • Types of Web access




Characterizing Web Workload of Mobile Clients          HUT T-110.456    Yu & Raitio 2005/2/23
                                                                                        13
                                   Case study – Web content
    • What content was available for wireless use?




Characterizing Web Workload of Mobile Clients   HUT T-110.456   Yu & Raitio 2005/2/23
                                                                                                    14
                          Case study – Web content size
    • How retrieved content varied in size?




              •   Replies are small:
                    • 98% of replies for wireless are less than 3kB
                    • 98% of replies for offline are less than 6kB
                    • 80% of bytes are carried in replies of size 10kB or more


    • Implications: systems could be optimized for small replies

Characterizing Web Workload of Mobile Clients   HUT T-110.456               Yu & Raitio 2005/2/23
                                                                                                15
                  Case study – Web content popularity
    • How popularity varied across documents?




              •   Heavy tailed distribution
                    • 0,1-0,5% of documents returned by 90% of the requests


    • Implications: caching could be very effective


Characterizing Web Workload of Mobile Clients   HUT T-110.456           Yu & Raitio 2005/2/23
                                                                                                   16
         Case study – Web user load distribution
    • How did individual users contribute to the load?




              •   Heavy tailed distribution
                    • Small group of users generate majority of the load


    • Implications: different pricing for different user groups needed

Characterizing Web Workload of Mobile Clients   HUT T-110.456              Yu & Raitio 2005/2/23
                                                                                                       17
               Case study – stability of Web access
    • How did interest vary during weekdays?




              •   Interests are relatively stable
                     • Of top 100 popular request, 80% remain popular during a week
                     • Of top 1000, 70%


    • Implications: performance can be optimized over the stable set
Characterizing Web Workload of Mobile Clients   HUT T-110.456                  Yu & Raitio 2005/2/23
                                                                                                  18
                  Case study – locality of Web access
    • Did people in the same region issue similar request?




              •   Randomly sampled user groups don’t differ from local users
                    • Geographic locality in requests is insignificant

    • Implications: geographic distribution of servers/content does not
      require localization

Characterizing Web Workload of Mobile Clients   HUT T-110.456             Yu & Raitio 2005/2/23
                                                                                           19
                   Case study – notification popularity
    • What type of content was available as notifications and how popular it
      was?




Characterizing Web Workload of Mobile Clients   HUT T-110.456      Yu & Raitio 2005/2/23
                                                                                                   20
                              Case study – notification size
    • How notification messages varied in size?




              •   Notification are small
                    • All messages contain less than 256 bytes


    • Implications: if delivery is not optimized, overhead caused by a network
      protocols may be considerable

Characterizing Web Workload of Mobile Clients   HUT T-110.456              Yu & Raitio 2005/2/23
                                                                                                   21
                   Case study – notification popularity
    • How popularity varied across notifications?




              •   Heavy tailed distribution
                    • Top 1% notifications accounted for 60% of messages


    • Implications: multicasting notifications would yield significant savings


Characterizing Web Workload of Mobile Clients   HUT T-110.456              Yu & Raitio 2005/2/23
                                                                                                 22
      Case study – notification load distribution
    • How did individual users contribute to the notification load?




              •   Heavy tailed distribution
                    • Top 5% of clients received 25% of notification messages
                    • Top 10% received 40%


    • Implications: different pricing for different user groups needed

Characterizing Web Workload of Mobile Clients   HUT T-110.456            Yu & Raitio 2005/2/23
                                                                                                 23
                  Case study – locality of notifications
    • Did people in the same region receive same notifications?




              •   Randomly sampled user groups differ from local users
                    • Users in same regions share notification content

    • Implications: regional differences may be utilized in planning of
      geographic distribution of servers/content

Characterizing Web Workload of Mobile Clients   HUT T-110.456            Yu & Raitio 2005/2/23
                                                                                                 24
      Correlation bwn browsing and notification
    • Limited correlation between client’s notification and browsing usage
    • People use two services for different purposes, two services deliver
      different type of contents
    • The result is useful to web design and pricing plan




                                                                Number of users
                                                                who have overlap
                                                                between
                                                                their top N
                                                                browsing categories
                                                                and top N
                                                                notification
                                                                categories.


Characterizing Web Workload of Mobile Clients   HUT T-110.456            Yu & Raitio 2005/2/23
          Workload comparison bwn wireline and                                                25


                      mobile web
    • Comparison in content
       • Web content is richer then wireless
       • Content size is smaller in wireless, limited display and bandwidth
       • Wireless content shares the Zipf-like popularity distribution as
         wireline content
    • Comparison in User behavior
       • Both user dependent
       • Both exhibit temporal stability
       • Wireless user does not exhibit strong spatial locality, limited
         content
    • Comparison in system load
       • Both exhibit a diurnal and weekly variation
       • Wireless server load is smaller than wireline server
       • Web site for mobile clients has more heterogeneous population of
         users



Characterizing Web Workload of Mobile Clients   HUT T-110.456         Yu & Raitio 2005/2/23
                                                                                                      26
                                                Power laws
    • Measure y depends on another measure x in linear dependence of the a th
      power of x




    • Power law distributions (a.k.a heavy-tail distributions) include e.g. the Zipfian
      and Pareto distributions
    • Why? Finding suitable distribution for observed data allows for probabilistic
      inference on the underlaying phenomenom in closed form

Characterizing Web Workload of Mobile Clients     HUT T-110.456               Yu & Raitio 2005/2/23
                                                                                             27
                                      Power laws and the Web
    • Several distributions derived from the topology of the Internet at
      router and domain level follow a power law
        • Number of documents per Web site or file system
        • Size of documents per Web site or file system
        • Session durations
        • Links between web pages
        • Example (a = -0.46):




Characterizing Web Workload of Mobile Clients   HUT T-110.456        Yu & Raitio 2005/2/23
                                                                                            28
                                                Self-similarity
    • Self-Similar (a.k.a. fractal) data:
       • Maintains its bursty characteristic even when aggregated over
         wide range of time scales
       • Slowly decaying variance
       • Long range dependence (not memoryless)


    • Underlaying phonomenom
       • Data generators which are either ON or OFF
       • The distribution of ON and OFF times (or message sizes) are
         heavy tailed
       • Aggregation of these data leads to self-similarity



    • Internet/WWW traffic is self-similar




Characterizing Web Workload of Mobile Clients       HUT T-110.456   Yu & Raitio 2005/2/23
                                                                                         29
                                   Self-similarity and the Web




Characterizing Web Workload of Mobile Clients   HUT T-110.456    Yu & Raitio 2005/2/23
                                  WebTraff:                                                  30
              A GUI for Web Proxy Cache Workload Modelling and
                                  Analysis
    • An extended and improved version of ProWGen (Proxy Workload
      Generator), including a GUI interface to a useful set of tools for Web
      traffic modelling and analysis
    • Purpose: To facilitate the easy generation and analysis of
      controllable and representative workloads for Web caching
      simulations
    • The WebTraff toolkit provides three main functions:
       • Web workload trace generation
       • Web workload trace analysis
       • Web proxy cache simulation

    • Graphs displayed in PostScript format




Characterizing Web Workload of Mobile Clients   HUT T-110.456        Yu & Raitio 2005/2/23
                                                                                           31
                                          WebTraff GUI Interface




Characterizing Web Workload of Mobile Clients     HUT T-110.456    Yu & Raitio 2005/2/23
                                                                                        32
                                   Web Workload Generation




Characterizing Web Workload of Mobile Clients   HUT T-110.456   Yu & Raitio 2005/2/23
                                                                                            33
                                         Web Workload Analysis




•Two main categories of analysis functions:
          •Time series analysis (on             the left)
          •Web workload analysis (on               the right)
•Radio buttons, slide bars and text boxes available to control plotting
characteristics




Characterizing Web Workload of Mobile Clients      HUT T-110.456    Yu & Raitio 2005/2/23
                                                                                            34
                                            Requests per Interval
                                              (time series plot)




Characterizing Web Workload of Mobile Clients       HUT T-110.456   Yu & Raitio 2005/2/23
                                                                                           35



                                    Popularity Distribution plot




Characterizing Web Workload of Mobile Clients   HUT T-110.456      Yu & Raitio 2005/2/23
                                                                                        36

              Document Size Distribution (zoomed)




Characterizing Web Workload of Mobile Clients   HUT T-110.456   Yu & Raitio 2005/2/23
                                                                                        37
                              Web Proxy Cache Simulation




  • Application-level caching simulation parameters
     • Cache size
     • Cache replacement policy

  • Five replacement policies currently available
     • Random replacement (RAND)
     • First-In-First-Out (FIFO)
     • Least-Recently-Used (LRU) (default setting)
     • Least-Frequently-Used (LFU)
     • Greedy-Dual-Size (GDS)




Characterizing Web Workload of Mobile Clients   HUT T-110.456   Yu & Raitio 2005/2/23
                                                                                                 38
               For More Information about WebTraff


  • WebTraff toolkit:
           •     http://www.cpsc.ucalgary.ca/~carey/software.htm

  •      “ProWGen: A Synthetic Workload Generation Tool for the Simulation
         Evaluation of Web Proxy Caches”
          • Busari/Williamson, Computer Networks, Vol 38, No 6, June 2002
          • http://www.cpsc.ucalgary.ca/~carey/publications.htm


  •      Contact information:
          • Email {carey,nayden}@cpsc.ucalgary.ca




Characterizing Web Workload of Mobile Clients   HUT T-110.456            Yu & Raitio 2005/2/23
                                                                                            39
                                                Summary
    • Workload characterization is information that usefull for making better
      decisions on
        • Web site/application design
        • Content management
        • Protocol design
        • Capacity planning
        • Service pricing
        • etc.
    • Workload characterization can be gained through
        • Gathering requirements for the analyses
        • Planning of data acquisition
        • Statistical analyses of the data
        • Mathematical modeling
    • There are tools for workload characterization
    • Power-law and self-similarity characteristics of load make the Web
      different from good old telephony world
        • Same models and optimization don’t necessarily apply in these
          two worlds


Characterizing Web Workload of Mobile Clients    HUT T-110.456      Yu & Raitio 2005/2/23
                                                                                             40
                                                References

    • Adya A, Bahl B, Qiu L. ”Characterizing Web Workload of Mobile
      Clients” in ”Content Networking in the Mobile Internet”, Ch5. Dixit S,
      Wu T (eds), 2004
    • Adya A, Bahl B, Qiu L. ”Characterizing Alert and Browse Services for
      Mobile Clients”, 2002
    • Kramer G., ”Self-similar Network Traffic”, 2001
    • Martin J. Fischer, Thomas B. Fowler. ”Fractals, Heavy-Tails, and the
      Internet”, 2001
    • Markatchev N, Williamson C. ” WebTraff:
      A GUI for Web Proxy Cache Workload Modelling and Analysis”,
      Department of Computer Science, University of Calgary, 2002




Characterizing Web Workload of Mobile Clients     HUT T-110.456      Yu & Raitio 2005/2/23

				
DOCUMENT INFO