Outline
• Definition
• Examples
What is Web log analysis? • Theory and Essential Construct
!
ss ion • Data Collection
cu
Jim Jansen
a dis • Method
this
College of Information Sciences and Technology • Discussion
The Pennsylvania State University
a k e
jjansen@acm.org
’ sm
Le t
W3C Extended Log Format
Web log analysis is part of the W3C Extended Log Format -
domain of … Variety of fields for examining
visitors to Web sites.
Other common format is
• ... Web analytics NCSA Separate Log that is
composed of three logs (
• The Web Analytics Association (WAA) defines Common log – actions on the
Web analytics as the measurement, collection, server,
Referral log – where they came
analysis, and reporting of Internet data for the from, and
purposes of understanding and optimizing Web Agent log – stuff about the client
computer)
usage (http://www.webanalyticsassociation.org/)
• Shares common theoretical and
methodology characteristics with all forms of
log analysis (e.g., Intranet logs, systems logs,
OPAC logs, search logs, etc.)
Variety of tools help make
Other Log Examples …
sense of this log data
Search Logs have some common fields, such as We can enrich the log
time, queries, results, etc. with additional fields.
Twitter log
Keyword advertising
logs provides
calculated metrics
Tweets with author in XML
Theoretical Foundations Behaviorism Characteristics
• Part of the behaviorism paradigm • Inductive, data-driven and characterized
by empirical observation of measurable
• Behaviorism – an approach focused on behavior
the outward behavioral aspects of thought • Grounded on somebody doing something
and emphases the observed behaviors in a situation (all the environmental and
• Behaviorism – Pavlov, Watson, & Skinner situational features are embedded
behaviors)
• Critics of behaviorism as a psychological
theory have issues with rejection of mental
processes. I agree - people are more than
“mediators between behavior and the
Ivan Petrovich Pavlov John B. Watson Burrhus Frederic Skinner environment” (Skinner, 1993, p 428)
What is a Behavior? What is a Behavior?
• Behavior is the essential construct of the
… an observable activity of a person, animal, team,
organization, or system. behaviorism and of log research
• Logs record behaviors of users and
One can classify behaviors into three general systems (records behavior but can’t tell
categories. Behaviors are affective, cognitive, or situational aspects)
• something that one can detect and record
• A behavior is the key variable
• actions or specific goal-driven events with some
purpose other than the specific action that is (i.e., an entity representing
observable a set of events where each
• reactive responses to environmental stimuli event may have a different
value)
Example of an Ethogram
Behavior Description
Ethograms View results Interaction in which the user viewed or scrolled one or more pages from the
results listing. If a results page was present and the user did not scroll,
we counted this as a View Results Page.
With Scrolling User scrolled the results page.
• a taxonomy or index of behavioral patterns Without Scrolling
but No Results in Window
User did not scroll the results page.
User was looking for results, but there were no results in the listing.
• details the different forms of behavior that
an user exhibits Selection
Click URL (in results listing)
Interaction in which the user makes a selection in the results listing.
Interaction in which the user clicked on a URL of one of the results in the
results page.
• categories of behavior are objective, Next in Set of Results List User moved to the Next results page.
Previous in Set of Results List User moved to the Previous results page.
discrete, not overlapping. This makes the GoTo in Set of Results List User selected a specific results page.
definitions of each behavior (and category
of behaviors) clear, detailed and View document Interaction in which the user viewed or scrolled a particular document in
the results listings.
distinguishable from each other With Scrolling
Without Scrolling
User scrolled the document.
User did not scroll the document.
Behavior Description of the behavior
Execute collection method?
What about the data Interaction in which the user initiated an action in the interface.
Execute Query Interaction in which the user entered, modified, or submitted a query
Data Collection: Trace Data Trace Data
• can view the data collected in log files as trace • In the past, trace data was often time consuming
data. to gather and process, making such data costly.
• people conducting the activities of their daily • logging software makes collecting trace data
lives many times create things, create marks, easy and cheap
induce on a carpet reduce some existing material.
Wear
wear, or • Log data is controlled accretion data, where the
researcher or some other entity alters the
• Within the confines of research, these things, environment in order to create the accretion data
marks, and wear become data • With the user of client apps (such as desktop
Trash heap
• Classically, trace data are the physical remains search bars), the collection of data is nearly
of people’s interaction unlimited from a technology perspective
Computer storage media What is cool about trace data for researchers?
Data Collection Methodological Foundations
Log data has significant advantages as a data
collection approach for the study and Use of logs to collect trace
investigation of behaviors, including: data is an unobtrusive
methods (a.k.a., non-
• Scale: not a limiting factor as in lab user studies
reactive or low-constraint).
• Power: large sample size for inference testing; in
fact, so large must account for the size effect Unobtrusive methods … Customer Behavior (video)
• Scope: naturalistic; researchers can investigate • allows data collection
range of interactions in a multi-variable context
without directly interfering
• Location: can collectin distributed environments
• Duration: collect log data over an extended
into the context and
period • does not require a direct
Chemistry (surface marking)
response from participants
Methodological Foundations Methodological Foundations
Three justifications for unobtrusive methods: Inherent characteristics in the method of log data
Example: ethnography studies (where the
• Uncertainty principle: researchers interjected into collection; Web analytics has issues to address as
an environment become part of participant
researcher “bird dogs” a study the system a result:
• Observer effect: difference that porn in a to an
Example: no one searches for is made lab • Abstraction – how does one relate low-level data
activityof Web searching to higher-level concepts?
study or a person’s behaviors by being
observed • Selection – how does one separate the necessary
• Observer bias: observers overemphasize blind
Example: is why medical trials are double from unnecessary data?
behavior they expect to find and fail to notice
rather than single blind • Reduction – how does one reduce the complexity
behavior they do not expect and size of the data set?
Trace data helps in overcoming the Uncertainty • Context – how does one interpret the significance
principle, Observer effect, and Observer bias in of events?
the data collection. Note for data collection but • Evolution – how can one collect data without
not data analysis impacting application deployment or use?
Recap of Web Analytics Research
Type of Data Trace • Book: Jansen, B. J., Spink, A., and Taksa, I. (2009)
Handbook of Research on Web Log Analysis,
Hershey, PA: Idea Group Publishing
Data Unobtrusive – First chapter on theory of log analysis is free!
Collection
• Lecture: Jansen, B. J. (Forthcoming) Understanding
User – Web Interactions via Web Analytics.
Key Construct Behavior Morgan-Claypool Lecture Series. Gary. Marchionini
(Ed). Morgan-Claypool: San Rafael, CA.
Que – manuscript about Web Analytics, soup to nuts
r
ry
U se
R es
Theoretical pons
r
pute
Behaviorism Click
e
Foundation
C om
Thank you!
(open for questions and further discussion)
Jim Jansen
College of Information Sciences and Technology
The Pennsylvania State University
jjansen@acm.org