Web Analytics
Raw web logs have lots and lots and lots of data
Raw web logs do not have sessionizing
– You have to decide about what to take out
(graphics?)
– You have to decide what questions you want to
ask….
– You have to ask your customers / clients
What is visualization (www.oed.com) ?
1. The action or fact of visualizing; the power or
process of forming a mental picture or vision of
something not actually present to the sight; a
picture thus formed.
2. The action or process of rendering visible
Another ...
Visualization is the use of graphical techniques to
communicate information and support reasoning
or analysis
Visualizations are cost-effective because they exploit
– powerful human visual processing capabilities
and
– high quality graphics created at low cost
Two kinds of visualizations
– Scientific Visualization
– Information Visualization
What is scientific visualization
Visual modelling of scientific data using computer
graphics
Examples
Visualization of brain models
http://www.loni.ucla.edu/SVG/
Focus is
– on modelling (visually) the input data as close to reality as
possible
– Not on presenting abstractions or relationships from the input
data
Why do we create visualizations?
Picture worth 1000 words
Bring attention to certain relationships
Cut through noise/To attract attention
Organize information
To aid quick understanding
Help understanding without words
Combine information and get new
information from combination
To persuade
To identify patterns
Why do we create visualizations?
Answer questions
Make decisions
See data in context
Expand memory
Support graphical calculation
Find patterns
Present argument
Tell a story
Inspire
Three functions of visualizations
Record information
– Photographs, blueprints, …
Support reasoning about information (analyze)
– Process and calculate
– Reason about data
– Feedback and interaction
Convey information to others (present)
– Share and persuade
– Collaborate and revise
– Emphasize important aspects of data
Record information
Napolean’s 1812 campaign on Russia
Input data
– Size of army
at the start of the campaign = 442,000
at the end of the campaign = 10,000
– Location of the army (2 dimensions)
– Direction of the army’s movement
– Temperature and
– Time
Created by French engineer Charles Minard…1861
from Tufte Book…
Minard’s drawing was
Considered the best graphic ever produced
– Inspiration for modern IV researchers
Plots all the data corresponding to all the six input variables
Clearly shows the message underlying the input data
– Gradual reduction in the size of the army
– Linked to the gradual fall in temperatures
Input data is complex
Yet, most important information abstracted out and presented in a
simple graphic
Record Data – answer questions
Gallop, Bay horse “Daisy” --- [Muybridge– 1884-86]
Support Reasoning .....
Mystery: what is causing a cholera
epidemic in London in 1854?
Visualization for Problem Solving
Illustration of John
Snow’s deduction that
a cholera epidemic
was caused by a bad
water pump, circa
1854.
Horizontal lines
indicate location of
deaths.
Crosses indicate
pumps
From Visual
Explanations by
Edward Tufte,
Graphics Press,
1997
Illustration of John
Snow’s
deduction that a
cholera epidemic
was caused by a
bad water pump,
circa 1854.
Horizontal lines
indicate location of
deaths.
From Visual Explanations by Edward Tufte,
Graphics Press, 1997
Find patterns
More patterns
Convey information to others....
London Subway Map Example
Abstract away details for easier
understanding
William Playfair, 1786
telling a story
The New York Times Spring 2007 women's fashion issue
included a funny and compelling visual explanation which compared the price
per square inch of hand bags to the price per square foot of real estate in and
around NYC.
Goals of visualization research
Understand how visualization s convey
information to people?
– What do people perceive/comprehend?
– How do visualization correspond with mental
models
Develop principles and techniques for creating
effective visualizations
– Amplify perception and cognition
– Strengthen connection between visualization
and models of data
Data and Image Models
The Big Picture
Task
Data Processing
physical type (int., Algorithms
float, etc
abstract type Image
(nominal, ordinal)
Domain Mapping
metadata visual encoding
semantics visual
conceptual model metaphore
Topics
Properties of Data
Properties of the image
Mapping data to the image
Data
Data models vs. Conceptual models
Data models are low level descriptions of the data
– Math: Sets with operations on them
– Example: integers with + and ���� operators
Conceptual models are mental constructions
– Include semantics and support reasoning
Examples (data vs. conceptual)
– (1D floats) vs. Temperature
– (3D vector of floats) vs. Space
Taxonomy of visual representations
1D (sets and sequences)
Temporal
2D (maps)
3D (shapes)
nD (relational)
Trees (hierarchies)
Networks (graphs)
Types of variables
Physical types
– Characterized by storage format
– Characterized by machine operations
– Example:
bool, short, int32, float, double, string, …
Abstract types
– Provide descriptions of the data
– May be characterized by methods/attributes
– May be organized into a hierarchy
– Example:
plants, animals, metazoans, …
Basic Numeric Data
Nominal (qualitative)
– (no inherent order)
– city names, types of diseases, ...
Ordinal (qualitative)
– (ordered, but not at measurable intervals)
– first, second, third, …
– cold, warm, hot
– Mon, Tue, Wed, Thu …
Quantitative
– integers or real
Nominal, ordinal and quantitative
N - Nominal (labels)
– Fruits: Apples, oranges, …
O – Ordered
– Quality of meat: Grade A, AA, AAA
Q - Interval (Location of zero arbitrary)
– Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45)
– Like a geometric point. Cannot compare directly
– Only differences (i.e. intervals) may be compared
Q - Ratio (zero fixed)
– Physical measurement: Length, Mass, Temp, …
– Counts and amounts
– Like a geometric vector, origin is meaningful
S. S. Stevens, On the theory of scales of measurements, 1946
Nominal, ordinal and quantitative
N - Nominal (labels)
– Operations: =, not equal
O – Ordered
– Operations: =, not equal. ,
Q - Interval (Location of zero arbitrary)
– Operations: =, not equal, , > equal, , > equal, > > > > >
Value
Nominal
quantitative
order > > >
Little Order
Color
Nominal
> > > > > > > >
Hues might give you order ?
Shape
Nominal
> > > > > > >
Orientation
Nominal
? <
< < < < < <
Encoding Rules
Univariate data
Factors
A B C
1 2 3 Variables
Univariate data
Bivariate data
Trivariate Data
Three Variables
Two variables [x,y] can map to points
eg., Scatterplots, maps, …
Third variable [z] must use …
– Color, size, shape,
Large design space (visual metaphors)
Multidimensional? How many variables
What you know now
Attributes of visual variables
– Position size
– shape value
– orientation color
– texture
Characteristics of visual variables
– Nominal
– Quantitative
– Order
Same information but stated differently….Ranking
of Applicability of Properties for Different Data
Types
(Mackinlay 88, Not Empirically Verified)
QUANT ORDINAL NOMINAL
Position Position Position
Length Density Color Hue
Angle Color Saturation Texture
Slope Color Hue Connection
Area Texture Containment
Volume Connection Density
Density Containment Color Saturation
Color Saturation Length Shape
Color Hue Angle Length
Deconstructions ... Mapping
Stock chart from 90s
x-axis time (Q)
y-axis price (Q)
Playfair again
x-axis:year (Q)
y-axis: currency (Q)
Color: imports/exports (O, N))
http://www.smartmoney.com/marketmap/
Wattenberg 1998
rectangle size: market cap (Q)
rectangle position: market sector (N), market cap (Q)
color hue: loss vs. gain (N, O)
color value: magnitude of loss or gain (Q)
7 “USER INTERACTION” tasks
The 7 interactive tasks users wish to perform:
– Overview: Gain an overview of the entire
collection.
– Zoom : Zoom in on items of interest
– Filter: filter out uninteresting items.
– Details-on-demand: Select an item or group and
get details when needed.
– Relate: View relationships among items.
– History: Keep a history of actions to support
undo, replay, and progressive refinement.
– Extract: Allow extraction of sub-collections and
of the query parameters.
7 data types
1 D Linear – univariate data
2D Map – unvariate data
3D World – trivariate data
Multi-dimensional – multidimensional data
Temporal
Tree
Network
Linear data
Long lists of items
– E.g. long lists of menu items and
– Software code listings etc.
Bifocal (or Fisheye) displays
– E.g. Fisheye menus developed by HCI Lab, UMD
– http://www.cs.umd.edu/hcil/fisheyemenu/
1D linear
This type of data include:
– textual documents
– program source code
– lists of textual information
Issues to consider when designing for this type of
data:
– fonts and styles;
– overview;
– scrolling;
– selection methods;
This figure displays the whole
system source code.
Color coding is used to provide
information about the thousands
of lines of the system.
Here the newest lines are in red
and the oldest in green
This figure shows another example of textual
information visualization for 1D.
http://www.mcmaster.com/
2D Linear
This type of data include:
– geographic maps;
– floorplans;
– newspaper layout;
Issues to consider when
designing for this type of data:
– may use multiple 2D-layers;
– ease of finding adjacent
items;
– ease of establishing paths
–
Another -
http://bioinformatics.oxfordjournals.org/cgi/content/fu
ll/22/17/2166
3D data
Complex trees and networks are visualized
using 3D graphics
Initially used in scientific visualization, but
gradually being introduced into information
visualization
3D World
this type of data includes:
– items with volume;
– items with complex relationship
Issues to consider when designing
for this type of data:
– positioning;
– orientation;
– occlusion;
In this figure the page presented to
the user as a desktop in order to
match the speed of interacting with
any documents with the speed of
manipulating a real piece of paper
from a desk. The idea here is to allow
users to group pages into books and
manipulate them as a whole.
Perspective Wall
Similar to Bifocal, except demagnifies at increasing rate, while
Bifocal is constant
Visualizes linear information such as timeline
Adds 3D but uses excess real estate on screen
Slide adapted from Hornung &
Zagreus
Another
Demonstrations
http://www.inxight.com/products/sdks/tw/
Temporal Data – a form of 3d data
Traditionally time series are visualized using trend
graphs and seasonality graphs
– A time series can be expressed in terms of its
trend and seasonality components
– Data = trend + seasonal + remainder
Trend And Seasonality in Time Series
Lifeline example
Visualization of computerised medical records
For a patient
– Horizontal lines (time lines) represent medical
problems, hospitalization and medications
– Icons on these lines represent events such as
tests and physician consultations
All the patient information is put on one screen
http://www.cs.umd.edu/hcil/lifelines/
Multi-Dimensional
This type of data include:
– relational data;
– statistical data;
Issues to consider when
designing for this type of data:
– may be difficult for users to
comprehend the
multidimensional
representation.
This figure shows that a
multidimensional set of data is
extracted from Excel
Trees
This type of data includes:
– items presenting a
relationship with a parent
item;
Issues to consider when
designing for this type of data:
– breadth;
– depth
Another tree – a treemap
Networks
This type of data include:
– items presenting a
relationship with an arbitrary
number of other items;
Issues to consider when
designing for this type of data:
– complexity of the
relationships between items;
– user's task;
This figure shows the majors routers
in the Internet network
Networks cont.
Networks cont. This figure shows the most densely used paths
for long-distance calls
Other Networks
Thinkmap http://www.thinkmap.com/
http://www.visualthesaurus.com/
http://w3.win.tue.nl/nl/onderzoek/onderzoe
k_informatica/visualization/sequoiaview//
To return to --- 7 Tasks involved in
interactive visualization
The seven basic tasks possible by a user:
– Overview
– Zoom
– Filter
– Details-on-demand
– Relate
– History
– Extract
Overview
Gain an overview of the entire
collection of the information
Zoom task
Zoom in on items of interest.
Smooth zooming helps users preserve their sense of position and
context.
See
http://micro.magnet.fsu.edu/primer/java/scienceopticsu/powersof10/index.html
http://www.lri.fr/~appert/website/orthozoom/videos/OZTypical.mov
Other zooming
Demos:
http://hcil.cs.umd.edu/video/1998/1998_
pad.mpg–(superceded by Piccolo, nee
Jazz)
–
http://www.cs.umd.edu/hcil/piccolo/play
/index.shtml
http://www-ui.is.s.u-
tokyo.ac.jp/~takeo/research/autozoom/autozoom.h
tm
Filter Task
Take out the uninteresting items.
The goal is to give users easy
controls with rapid display
updates, no matter the amount of
data presented.
The figure shows the scrollbar
allowing to select one single
compound for filtering.
Details on demand task
Select an item or a group and get
more details when needed, once
the entire collection has been
reduced to a few items.
This figure shows that once a set of
item is extracted, detail-on-
demand is available for further
manipulation.
Another….
http://www.guardian.co.uk/flash/0,,1131346,00.html
Interaction with Scatter plots
Relate Task
View relationship among items.
By varying the value of one
attribute at a time, the
information being displayed
considers only the items whose
value for this attribute matches a
certain relationship
This figure shows a data set where
color indicate a relationship
between items according to a
parameter previously choose
http://www.cs.cmu.edu/Groups/sage/
sdmwalk1.html
History Task
Keep a history of actions to support undo, replay,
and progressive refinement.
Why? Because it is vary rare that a single action
produces the desired output.
Extract Task
Allow extraction of subcollections
and the query of parameters,
either for further analysis or for
saving separately. Even to drag-
drop the subset into another
application for further processing.
This figure shows how related items
are extracted visually while
preserving the relationships
(size) between elements
The relation of Cognition and
Visualization
With respect to the cognitive theory seen at the
beginning of this course, the following concepts
are involved in visualization:
– Attention
– Abstraction
– Affordances
Attention
Learning complex-query languages or complex
information coding rules is distracting, and
prevents users to focus on their information needs.
Users need to have:
– simple menus;
– direct-manipulation;
– simple visual coding rules;
– easily understandable metaphor
– appealing appearance
– meaningful animation
– sense of location/position
Abstraction
Abstract-information (statistical data, etc...)
visualization reveals patterns, gaps, clusters or
outliers.
– Proximity/relationships between items should
emerge
– Group of elements
Affordances
Affordances must be obvious to the users through
the use of:
– the proper representations;
– the proper metaphors;
– feedback about possible actions or new
affordances on an object;
Considerations when presenting data using visual
explanations
Your information should be clear
– Use white space to control the emphasis of elements and data within your design
– Use typography and the modular grid to balance the weight and flow of your design
– Use color in meaningful ways and don't let it overwhelm your data.
– Assume that the audience is intelligent. Even publications, such as NY Times,
assume that people are intelligent enough to read complex prose, but too stupid to
read complex graphics.
– Don't limit people by "dumbing" the data -- allow people to use their abilities to get
the most out of it.
– To clarify -- add detail (don't omit important detail; e.g., serif fonts are more
"detailed" than san serif fonts but are actually easier to read).
And Einstein once said that "an explanation should be as simple as possible, but
no simpler".
– Show the data. Graphical diagrams and maps are "intelligence made visible"
– Data rich plots can show huge amounts of information from many different
perspectives: cause & effect, relationships, parallels, etc.
– Plots need annotation to show data, data limitations, authentication, and exceptions
– Don't use graphics to decorate a few numbers, use them to enhance the meaning of
the data.
– Avoid dis-information: thick surrounding boxes and underlined san serif text make
reading more difficult
Optimal use of graphic elements
Edward Tufte defines the data ink ratio as: Data Ink Ratio = (data-
ink)/(total ink in the plot)
The goal is to make this as large as is reasonable. To do this you:
– Avoid heavy grids
– Replace box plots with interrupted lines
– Replace enclosing box with an x/y grid
– Use white space to indicate grid lines in bar charts
– Use tics of dashes (w/o line) to show actual locations of x and y
data
– Prune graphics by: replacing bars with single lines, erasing non-
data ink; eliminating lines from axes; starting x/y axes at the data
values
– Avoid over busy grids, excess tics, redundant representation of
simple data, boxes, shadows, pointers, legends. Concentrate on
the data and NOT the data containers
– Always provide as much scale information as is needed
Colors can often enhance data
comprehension
Color grids are a form of layer which provides context but which should be
unobtrusive and muted
Pure bright colors should be reserved for small highlight areas and almost
never used as backgrounds.
Use color as the main identifier on computer screens as different objects are
often considered the same if they have the same color regardless of their
shape, size , or purpose
Contour lines that change color based on the background standout without
producing the 1+1=3 effects
Colors can be used as labels, as measures, and to imitate reality (e.g., blue
lakes in maps).
Don't place bright colors mixed with White next to each other.
Color spots against a light gray are effective
Colors can convey multi-dimensional values
Note that surrounding colors can make two different colors look alike, and
two similar colors look very different
Subtle shades of color or gray scale are excellent to use and the differences
may be accentuated when they are delimited with fine darker contour lines
Be aware that 5-10% of people are color blind to some degree (red-green is
the most common type followed by blue-yellow, which usually includes blue-
green)
Evaluate your design by asking the right questions:
Does the display tell the truth?
Is the representation accurate?
Are the data documented?
Do the display methods tell the truth?
Are appropriate comparisons, contrasts, and
contexts shown?
http://www.turbulence.org/Works/nums/
http://www.karlhartig.com/chart/chart.html
http://www.textarc.org/
http://oursignal.com/
http://www.npr.org/templates/story/story.php?story
Id=121875404
http://www.good.is/