Visualization in Data Mining
Document Sample


VISUALIZATION IN DATA MINING
Data Mining (CSE-590)
Spring 2009
Prof. Anita Wasilewska
Abhishek Sharma Arunkumar Senthilnathan
Computer Science Computer Science
SUNY Stony Brook SUNY Stony Brook
AGENDA
References
Visual Data Mining
Goals of Visualization
Diagrammatic Representation of the Process
Data
Visual Data Mining Techniques
Data Visualization Techniques
Simple 2D or 3D techniques
Geometrically transformed displays
Iconic displays
Dense Pixel Displays
Stacked Displays
Data Interaction and Distortion Techniques
Dynamic Projection
Interactive Distortion
Interactive Filtering
Interactive Linking and Brushing
Conclusion
REFERENCES
Visual Data Mining: An Introduction and Overview by Simeon J. Simoff,
Michael H. Böhlen, and Arturas Mazeika, School of Computing and Mathematics,
College of Heath and Science University of Western Sydney, {arturas,
boehlen}@inf.unibz.it
Visual Data Exploration Techniques for System Administration by Tam
Weng Seng
Visualizing Hierarchies Via Exotic Trees by Ying-Huey Fua, yingfua@cs.wpi.edu
Information Visualization and Visual Data Mining, Daniel A. Keim, IEEE
TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 7,
NO. 1, JANUARY-MARCH 2002
http://noconsensus.files.wordpress.com/2009/01/histogram-of-sunspots.jpg
http://www.statcan.gc.ca/edu/power-pouvoir/ch9/images/pie5.gif
http://zedgraph.org/wiki/index.php?title=Scatter_Plot_Demo
http://www.jpowered.com/php-scripts/line-graph/images/line-graph-negative.gif
http://www.nbb.cornell.edu/neurobio/land/PROJECTS/Inselberg/FewPoints.jpg
http://www.drosophila-images.org/images-2007/09-slide-f.gif
http://www.infovis-wiki.net/images/5/5a/Conetree.jpg
http://www.mathworks.com/access/helpdesk/help/techdoc/data_analysis/cenus-
line-hist-map1.png
VISUAL DATA MINING
Basic Idea
Visual presentation of data
Gain insight and generate hypothesis
Draw conclusions
Directly interact with data
Include human in data exploration
Use his/her flexibility
Creativity
General Knowledge
GOALS OF VISUALIZATION
Create qualitative overview of Large and complex
data
Ease the interpretation of data
Speed up the mining process by making it
intuitive
Maximize the value of data to the user
THE PROCESS
Visual Data Mining: An Introduction and Overview by Simeon J. Simoff,
Michael H. Böhlen, and Arturas Mazeik
DATA TO BE VISUALISED
Data usually consist of large number of records
Each record has many attributes
Each attribute is equivalent to a dimension
Dimensionality – Number of attributes
One dimensional
Two dimensional
Multi dimensional
More complex – text and hierarchies
ONE DIMENSIONAL DATA
Based on a single quantitative variable(attribute)
Classic Example is Temporal data
May be represented as
Histogram
Pie Chart
HISTOGRAM
http://noconsensus.files.wordpress.com/2009/01/histogram-of-sunspots.jpg
PIE CHART
http://www.statcan.gc.ca/edu/power-pouvoir/ch9/images/pie5.gif
TWO DIMENSIONAL DATA
Two attributes (dimensions) are involved
The attributes (dimensions) are related to each
other
Generally represented as
Scatter Plots
Line Graphs
SCATTER PLOT
http://zedgraph.org/wiki/index.php?title=Scatter_Plot_Demo
LINE GRAPHS
http://www.jpowered.com/php-scripts/line-graph/images/line-graph-negative.gif
THREE DIMENSIONAL DATA
MULTI-DIMENSIONAL DATA
Practical data sets have more than 3 dimensions
Relational data base may have hundreds of
columns (attributes/dimensions)
How do we visually represent more than 3
dimensions ?
Common technique for representing
multidimensional data is “Parallel co-ordinates”
PARALLEL CO-ORDINATES
Information Visualization and Visual Data Mining,
Daniel A. Keim
TEXT AND HIERARCHIES
Hierarchies and
Text
relationships
Not described by
numbers so no
standard visualization
applicable
Transformation to
description vectors is
necessary before
visualizations can be
applied
CLASSIFICATION – INFORMATION AND
VISUALIZATION TECHNIQUES
Information Visualization and Visual Data Mining, Daniel A. Keim
VISUALIZATION TECHNIQUES
Simple 2D or 3D techniques
X-y plots
Bar-chart
Pie chart
Line graphs
Geometrically transformed displays
Iconic displays
Dense Pixel Displays
Stacked Displays
GEOMETRICALLY TRANSFORMED DISPLAYS
Aims at finding interesting transformations of
multidimensional data
Mainly Include
Prosection view
Hyper-slice
Parallel Co-ordinates
PARALLEL CO-ORDINATE
For High Dimensional multivariate data
Each dimension is represented as a vertical
line/axis (generally equidistant)
A point in n dimensional space is represented by
a poly-line with intersections on parallel axis
Position of vertex/intersection on ith axis
correspond to the value of ith co-ordinate of that
point
PARALLEL CO-ORDINATES
http://www.nbb.cornell.edu/neurobio/land/PROJECTS/Inselberg/FewPoints.jpg
PARALLEL CO-ORDINATES
http://www.nbb.cornell.edu/neurobio/land/PROJECTS/Inselberg/FewPoi
nts.jpg
PARALLEL CO-ORDINATES
http://www.drosophila-images.org/images-2007/09-
slide-f.gif
ICONIC DISPLAYS
Used for Multidimensional data
Aims at using symbols, or icons, to map an
attribute of a multidimensional data set to an
icon
Icons might include faces, sticks, color icons and
geometric shapes
ICONIC DISPLAYS
(a) Telnet and
login
(b) Privileged FTP
(c) Anonymous
FTP
(d) NFS
(e) Initial access
(f) Timed out
(a) To be authenticated connections
(b) Successfully established connections
(c) (c) the node under investigation
(a) Snapshot
taken
during
forenoon
(a) Snapshot
taken
after midnight
Visual Data Exploration Techniques for System Administration by
DENSE PIXEL DISPLAYS
Each attribute value is represented as a colored pixel
Value range of attributes are mapped to fixed color
map
Different attributes represented in separate sub-
windows/areas
Allows largest amount of data possible on current
displays
1 pixel per data value
DENSE PIXEL DISPLAYS
STACKED DISPLAYS
Present data partitioned in hierarchical fashion
Basic concept is to stack one dimension over the
other (dimensions are hierarchical)
Main methods include
Dimensional Stacking
Worlds-within-worlds
Tree maps
Cone Trees
CONE TREE
http://www.infovis- Visualizing Hierarchies Via Exotic
wiki.net/images/5/5a/Conetree.jpg Trees by Ying-Huey Fua
INTERACTION AND DISTORTION
TECHNIQUES
Very important in addition to data visualization
techniques
Interactive techniques
To interact with the visualization
Dynamically change the visualization
Distortion Techniques
Helps in exploring the data in detail
DYNAMIC PROJECTION
Dynamically change projection to explore multi-
dimensional data set
Classis example – Grand Tour System
Shows all interesting 2D projections of multi-
dimensional data set
Series of scatter plots
Order can be random, manual or pre-computed
INTERACTIVE FILTERING
Interactively partition the data into segments
Can focus on interesting subsets
Can be done by
Browsing – Direct selection of desired subset
Querying – Specifying properties of desired subset
Browsing is difficult for large data sets
Common tool – Magic Lenses
INTERACTIVE DISTORTION
Support data exploration
Preserve the overview of data
Idea
Show certain portions of data at higher level of
details
Other portions are still shown at lower levels of detail
Example : Graphical fisheye view
FISHEYE VIEW
Focus on Missouri,
Outline of US
Kentucky, Tennessee
INTERACTIVE LINKING AND BRUSHING
All visualization methods have some strengths
and some weaknesses
Idea is to combine different visualization
methods to overcome short coming of each
Multiple visualization of same data
Changes made to one visualization are
automatically made to another
INTERACTIVE LINKING AND BRUSHING
http://www.mathworks.com/access/helpdesk/help/techdoc/data_analysis/cenus-line-
hist-map1.png
CONCLUSION
Exploration of large data sets is an important but
difficult problem
Information visualization techniques help to
solve the problem
There exists several techniques to perform data
visualization. The appropriate technique can be
used for a particular dataset
Visual data exploration has a high potential and
many applications such as fraud detection and
data mining will use information visualization
technology for an improved data analysis
Related docs
Get documents about "