Guide to the Clickstream Data by zqu87940

VIEWS: 27 PAGES: 8

									Guide to the Clickstream
                     Data

           Petr Berka
University of Economics, Prague
         berka@vse.cz
           Web Usage Mining Domain
   click-stream - a sequential series of page view
    (displays on user’s browser at one time)
    requests,
   server session - a click-stream of page views
    for a single user for a particular web site,
   user session - is the click-stream of page views
    for a single user across the entire web.

                          Clickstream Data,
                      Discovery Challenge 2005      2
                     The Clickstream Data
   ~3Millions of records (24 days) from a www shop
    web server log
   Contains information about time; IP address;
    session ID; page request; referer
   There are hundreds of thousands of sessions;
    most of them very short, on average 16 pages
   Each page request in this www shop has the same
    structure – page type / content ID (product ID)
   Page types are for example dp (detail of product),
    sb (shopping basket), ct (contact)

                          Clickstream Data,
                      Discovery Challenge 2005     3
                                   Example of the Data

unix time ;IP address   ; session ID                          ; page request; referee

1074589200;193.179.144.2 ;1993441e8a0a4d7a4407ed9554b64ed1;/dp/?id=124      ;www.google.cz;
1074589201;194.213.35.234;3995b2c0599f1782e2b40582823b1c94;/dp/?id=182      ;
1074589202;194.138.39.56 ;2fd3213f2edaf82b27562d28a2a747aa;/                ;www.seznam.cz;
1074589233;193.179.144.2 ;1993441e8a0a4d7a4407ed9554b64ed1;/dp/?id=148      ;/dp/?id=124;
1074589245;193.179.144.2 ;1993441e8a0a4d7a4407ed9554b64ed1;/sb/             ;/dp/?id=148;
1074589248;194.138.39.56 ;2fd3213f2edaf82b27562d28a2a747aa;/contacts/       ; /;
1074589290;193.179.144.2 ;1993441e8a0a4d7a4407ed9554b64ed1;/sb/             ;/sb/;




                                       Clickstream Data,
                                   Discovery Challenge 2005                             4
                                  Data Description
   table “obchod” (shop) - name of the internet shop (7
    entries),
   table “kategorie” (category) - info about category of
    products (64 entries),
   table “list” (sheet) - info about a specific product of a
    more detailed type (157 entries),
   table “znacka” (brand) - name of the producer or
    brand of a product (197 entries),
   table “tema” (theme) - info about themes discussed
    in the on-line advice (36 entries)
                           Clickstream Data,
                       Discovery Challenge 2005           5
                      Data Summary (1/3)
   3 617 171 page requests
   522 410 sessions
       318 523 single page
       203 887 length > 1
       avg. length 16
       median 8
       modus 2
       longest 15454
                          Clickstream Data,
                      Discovery Challenge 2005   6
                     Data Summary (2/3)
   time spent during a session
       avg. time 00:24:46
       median 00:03:08
       modus 00:00:09
       longest 433:27:53




                         Clickstream Data,
                     Discovery Challenge 2005   7
                  Data Summary (3/3)
           Návštěvy obchodů
distribution of sessions with length > 1

             6%
                       18%                   10;www.shop1.cz
   13%
                                             11;www.shop2.cz
                                             12;www.shop3.cz
                             10%
 10%                                         14;www.shop4.cz
                                             15;www.shop5.cz
                                             16;www.shop6.cz
                         19%
       24%                                   17;www.shop7.cz




                      Clickstream Data,
                  Discovery Challenge 2005                8

								
To top