Lies_ Damn lies and Web Statistics

Document Sample
Lies_ Damn lies and Web Statistics Powered By Docstoc
					IWMW 2005: Who’s web is it anyway?
Lies, Damn lies and Web Statistics

Dr. Mike Lowndes,
Interactive Media Manager,
Natural History Museum, London
   – Houses 350-permanent scientific staff, plus postgraduate
   students; one of the largest UK research institutes in the
   natural sciences.

           (Right-click or click-hold (Mac) and press k or select Speaker Notes)

•   Why bother?
•   Issues with web logs
•   Issues with analytic tools
•   Browser tracking
•   Comparison between approaches
•   Known issues with browser tracking
•   Nedstat input and findings from Newcastle
                        Why bother?

• Web log analysis is currently the main method used to
  quantify web site usage for reporting.
• Results are used by the government as performance
  indicators for institutional websites.
• Not accurate or meaningful most of the time
   – no good for absolute measurement of usage.
Can be used for:
• Trend analysis
• Content preferences
• ROI estimation
• Checking and fixing your site
• Understanding users behaviour
• Testing assumed pathways
                     Issues with server logs
•   Dynamic IP
     – Many users using the same IP number over time.
     – Same user assigned many IP numbers over time.
•   Proxies
     – Several or many users behind 1 IP number
•   Caches (can be ‘in’ Proxies)
     – Commonly requested files cached closer to the users.
     – Can form the top 20-50 hosts accessing sites.
•   Robots and spiders
     – Few visits but lots of hits.
     – Analytic packages cannot keep up to date with all of them for exclusion.
•   Syndication
     – RSS feeds generate huge logs, but are not ‘read’ by humans initially.
     – Click-through configuration.
•   Reporting by analysis tools
     – Often weekly or monthly reports: realtime is very labour/server intensive
     – Reports often complex and techy.
                Issues with log analysis tools

• Webtrends vs
•   1. Natural History Museum
     – Summary SP ( Version 4.2.1, unregistered demo, default configuration
•   2. UKOLN (Bath)
     – WebTrends ( Version 5, default configuration

• Both tools were applied to the same log file
• Default configurations – not removing robots
     – Note: WebTrends documentation not clear on this point
               Measurement discrepancies

                         Summary SP   Webtrends 7
Connections (hits)       -            +0.67% hits
Page views (page hits)   -            +5.00%
Visits (user sessions)   -            +0.07%
Failed hits              -            +0.30%
Average visit duration   -            -30.0% (+250%)
IE                       75%          86%
Netscape compatible      2%           4%

Top Level Domains        US           US
                         UK           UK
                         AUS          CAN
                         NETHER       NETHER
                         CAN          AUS
                         JAP          JAP
           Comparison between tools

• Not a single measurement was identical.
• Most measurements were within 5%
• Visit duration measurement widely different, and
  can depend on configuration. Possible bug in
  WebTrends version 5.
• Page view measurements were quite different.

Results broadly similar but direct comparisons,
 especially of Page Views, are not really justified.
                  Browser tracking

•   Do they have fewer inaccuracies and distortions?
•   Is it easier on the web team?
•   Is it affordable?
•   Does it give us more information / better
                Browser tracking

• Requires code to be added to pages
• Uses an image, sourced from the tracking website.
  Also uses javascript and cookies for gathering
  extended and repeat-visit information
• Usually hosted services
• Provide near real-time tracking
• Few of the issues distorting logs affect these
  measurements (according to the blurb)
• Main players: Nedstat, Nielson/Netratings,
           Comparison between tools

• Summary SP VS Nielson/Netratings
• Run on one section of a site over a month.
• ‘Visiting’ section of the Natural History Museum site
  – small but popular and easily tagged.
                        Results 1 – visits and visitors

Visits / User sessions                27,663     40,402   -32%   35,395
Visits per day (ave)                     922      1,347           1,180
Visits per visitor per month (ave)       1.1        1.7             1.5
Unique visitors (browsers)            25,127     23,585          23,084

Pages per visit (ave)                   3.31          3             2.1
Visit duration (ave)                   02:09      07:13           04:08

Page impressions                      91,506    117,447          71,895
              Results 2 – pages viewed

value                            Browser track   Log analysis
Top 10
index.html, Visiting home.              31,117         28591
where are we? page                      17,897         26566
planning your visit page                 6,835         16773
events calendar page                     9,221          9369
howtogethere -local map page             4,700          5005
access guide introduction page           1,978          4653
travel details page                      3,550          3668
facilities page                          2,767          3497
activities page                          3,293          3375
multilingual info.                         828          1901
top ten totals                          82,186       103,398
                     Results 3 – country

                                 Browser tr.   GeoIP (Sum.)
     Countries                   uk 75%        uk 62%
                                 us 5%         us 8%
                                 spain         spain
                                 italy         netherlands
                                 netherlands   germany
                                 france        italy
                                 germany       france
                                 belgium       canada
                                 poland        poland

•   Depends on the quality of the geographical IP database, not
    the mode of tracking?
     Conclusions regarding traditional Log

Assuming browser tracking is more accurate…
• We have fewer visit sessions than we thought, but
  more visitors
  – Fewer visits (sessions), possibly due to robot exclusion
  – More visitors (unique users), possibly due to the masking
    effect of proxies/caches and browser caches
• Visit duration is much shorter than thought
  – possibly due to robots/spiders and cache updating.
• Country information is roughly accurate so long as a
  geographical lookup is used.
• Activity of popular pages, which are often cached,
  will be underestimated
          Browser tracking advantages

• Almost real-time analysis, incremental data.
• Better repeat user tracking and individual pathway
• Configurable, graphical reports for non-techies
  – Techie still needs to configure those reports however, as
    an understanding of web analytics is required
• Cut our monthly staff time down from 1.5 days to 1
• Appear to be more accurate in describing the
  activity of real people, but we would like to see
  some independent research.
                  Issues with browser tracking
• Setup is not trivial: You need to add code to every page.
   – Multiple server / ownership issues.
• Does not always work (or get full user details) if Javascript is turned
  off or cookies disallowed.
• Does not work with text-only browsers.
• Unknown compatibility with PDAs, mobiles etc.
• Would we get different results with different hosted services?
   – ABCE: industry standards for measurement
• Cookies often deleted unless user is confident in the source?
   – This would affect the measurement of repeat visitors and behaviour
Political issues:
• Issues with external hosting of institutional data
• Security of personal data issues with external hosting
   – E.g. measurements of student and staff use of a VLE.
                   Next steps

• Many private sector and public sector sites have
  already moved to browser tracking.
• About 6 National Museums are currently discussing
  hosted browser tracking.
• 5 Universities currently involved in a trial of
Thank you