Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Visualization in Data Mining by ocn20264

VIEWS: 113 PAGES: 38

									  VISUALIZATION IN DATA MINING



       Data Mining (CSE-590)
             Spring 2009
       Prof. Anita Wasilewska


Abhishek Sharma      Arunkumar Senthilnathan
Computer Science        Computer Science
SUNY Stony Brook        SUNY Stony Brook
AGENDA
   References
   Visual Data Mining
   Goals of Visualization
   Diagrammatic Representation of the Process
   Data
   Visual Data Mining Techniques
       Data Visualization Techniques
           Simple 2D or 3D techniques
           Geometrically transformed displays
           Iconic displays
           Dense Pixel Displays
           Stacked Displays
       Data Interaction and Distortion Techniques
           Dynamic Projection
           Interactive Distortion
           Interactive Filtering
           Interactive Linking and Brushing
       Conclusion
REFERENCES
   Visual Data Mining: An Introduction and Overview by Simeon J. Simoff,
    Michael H. Böhlen, and Arturas Mazeika, School of Computing and Mathematics,
    College of Heath and Science University of Western Sydney, {arturas,
    boehlen}@inf.unibz.it
   Visual Data Exploration Techniques for System Administration by Tam
    Weng Seng
   Visualizing Hierarchies Via Exotic Trees by Ying-Huey Fua, yingfua@cs.wpi.edu
   Information Visualization and Visual Data Mining, Daniel A. Keim, IEEE
    TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 7,
    NO. 1, JANUARY-MARCH 2002
   http://noconsensus.files.wordpress.com/2009/01/histogram-of-sunspots.jpg
   http://www.statcan.gc.ca/edu/power-pouvoir/ch9/images/pie5.gif
   http://zedgraph.org/wiki/index.php?title=Scatter_Plot_Demo
   http://www.jpowered.com/php-scripts/line-graph/images/line-graph-negative.gif
   http://www.nbb.cornell.edu/neurobio/land/PROJECTS/Inselberg/FewPoints.jpg
   http://www.drosophila-images.org/images-2007/09-slide-f.gif
   http://www.infovis-wiki.net/images/5/5a/Conetree.jpg
   http://www.mathworks.com/access/helpdesk/help/techdoc/data_analysis/cenus-
    line-hist-map1.png
VISUAL DATA MINING
   Basic Idea
     Visual presentation of data
     Gain insight and generate hypothesis
     Draw conclusions
     Directly interact with data

   Include human in data exploration
     Use his/her flexibility
     Creativity
     General Knowledge
GOALS OF VISUALIZATION
 Create qualitative overview of Large and complex
  data
 Ease the interpretation of data

 Speed up the mining process by making it
  intuitive
 Maximize the value of data to the user
 THE PROCESS




Visual Data Mining: An Introduction and Overview by Simeon J. Simoff,
Michael H. Böhlen, and Arturas Mazeik
DATA TO BE VISUALISED
 Data usually consist of large number of records
 Each record has many attributes

 Each attribute is equivalent to a dimension

 Dimensionality – Number of attributes
     One dimensional
     Two dimensional
     Multi dimensional
     More complex – text and hierarchies
ONE DIMENSIONAL DATA
   Based on a single quantitative variable(attribute)

   Classic Example is Temporal data

   May be represented as
       Histogram
       Pie Chart
HISTOGRAM




  http://noconsensus.files.wordpress.com/2009/01/histogram-of-sunspots.jpg
PIE CHART




    http://www.statcan.gc.ca/edu/power-pouvoir/ch9/images/pie5.gif
TWO DIMENSIONAL DATA
 Two attributes (dimensions) are involved
 The attributes (dimensions) are related to each
  other
 Generally represented as
     Scatter Plots
     Line Graphs
SCATTER PLOT




       http://zedgraph.org/wiki/index.php?title=Scatter_Plot_Demo
 LINE GRAPHS




http://www.jpowered.com/php-scripts/line-graph/images/line-graph-negative.gif
THREE DIMENSIONAL DATA
MULTI-DIMENSIONAL DATA
 Practical data sets have more than 3 dimensions
 Relational data base may have hundreds of
  columns (attributes/dimensions)
 How do we visually represent more than 3
  dimensions ?
 Common technique for representing
  multidimensional data is “Parallel co-ordinates”
PARALLEL CO-ORDINATES




      Information Visualization and Visual Data Mining,
      Daniel A. Keim
TEXT AND HIERARCHIES
                           Hierarchies and
Text
                           relationships

 Not described by
  numbers so no
  standard visualization
  applicable
 Transformation to
  description vectors is
  necessary before
  visualizations can be
  applied
CLASSIFICATION – INFORMATION AND
VISUALIZATION TECHNIQUES




   Information Visualization and Visual Data Mining, Daniel A. Keim
VISUALIZATION TECHNIQUES
   Simple 2D or 3D techniques
     X-y plots
     Bar-chart
     Pie chart
     Line graphs

 Geometrically transformed displays
 Iconic displays

 Dense Pixel Displays

 Stacked Displays
GEOMETRICALLY TRANSFORMED DISPLAYS
 Aims at finding interesting transformations of
  multidimensional data
 Mainly Include
     Prosection view
     Hyper-slice
     Parallel Co-ordinates
PARALLEL CO-ORDINATE
 For High Dimensional multivariate data
 Each dimension is represented as a vertical
  line/axis (generally equidistant)
 A point in n dimensional space is represented by
  a poly-line with intersections on parallel axis
 Position of vertex/intersection on ith axis
  correspond to the value of ith co-ordinate of that
  point
PARALLEL CO-ORDINATES




http://www.nbb.cornell.edu/neurobio/land/PROJECTS/Inselberg/FewPoints.jpg
PARALLEL CO-ORDINATES




  http://www.nbb.cornell.edu/neurobio/land/PROJECTS/Inselberg/FewPoi
  nts.jpg
PARALLEL CO-ORDINATES




     http://www.drosophila-images.org/images-2007/09-
     slide-f.gif
ICONIC DISPLAYS
 Used for Multidimensional data
 Aims at using symbols, or icons, to map an
  attribute of a multidimensional data set to an
  icon
 Icons might include faces, sticks, color icons and
  geometric shapes
ICONIC DISPLAYS
                                                (a)   Telnet and
                                                      login
                                                (b)   Privileged FTP
                                                (c)   Anonymous
                                                      FTP
                                                (d)   NFS
                                                (e)   Initial access
                                                (f)   Timed out
   (a)   To be authenticated connections
   (b)   Successfully established connections
   (c)   (c) the node under investigation




                                                                  (a)  Snapshot
                                                                         taken
                                                                        during
                                                                       forenoon
                                                                  (a) Snapshot
                                                                         taken
                                                                    after midnight




Visual Data Exploration Techniques for System Administration by
DENSE PIXEL DISPLAYS

 Each attribute value is represented as a colored pixel
 Value range of attributes are mapped to fixed color
  map
 Different attributes represented in separate sub-
  windows/areas
 Allows largest amount of data possible on current
  displays
       1 pixel per data value
DENSE PIXEL DISPLAYS
STACKED DISPLAYS
 Present data partitioned in hierarchical fashion
 Basic concept is to stack one dimension over the
  other (dimensions are hierarchical)
 Main methods include
     Dimensional Stacking
     Worlds-within-worlds
     Tree maps
     Cone Trees
CONE TREE




 http://www.infovis-                 Visualizing Hierarchies Via Exotic
 wiki.net/images/5/5a/Conetree.jpg   Trees by Ying-Huey Fua
INTERACTION AND DISTORTION
TECHNIQUES
 Very important in addition to data visualization
  techniques
 Interactive techniques
     To interact with the visualization
     Dynamically change the visualization

   Distortion Techniques
       Helps in exploring the data in detail
DYNAMIC PROJECTION
 Dynamically change projection to explore multi-
  dimensional data set
 Classis example – Grand Tour System
     Shows all interesting 2D projections of multi-
      dimensional data set
     Series of scatter plots
     Order can be random, manual or pre-computed
INTERACTIVE FILTERING
 Interactively partition the data into segments
 Can focus on interesting subsets

 Can be done by
     Browsing – Direct selection of desired subset
     Querying – Specifying properties of desired subset
     Browsing is difficult for large data sets

   Common tool – Magic Lenses
INTERACTIVE DISTORTION
 Support data exploration
 Preserve the overview of data

 Idea
     Show certain portions of data at higher level of
      details
     Other portions are still shown at lower levels of detail

   Example : Graphical fisheye view
FISHEYE VIEW
                Focus on Missouri,
Outline of US
                Kentucky, Tennessee
INTERACTIVE LINKING AND BRUSHING
 All visualization methods have some strengths
  and some weaknesses
 Idea is to combine different visualization
  methods to overcome short coming of each
 Multiple visualization of same data

 Changes made to one visualization are
  automatically made to another
   INTERACTIVE LINKING AND BRUSHING




http://www.mathworks.com/access/helpdesk/help/techdoc/data_analysis/cenus-line-
hist-map1.png
CONCLUSION
 Exploration of large data sets is an important but
  difficult problem
 Information visualization techniques help to
  solve the problem
 There exists several techniques to perform data
  visualization. The appropriate technique can be
  used for a particular dataset
 Visual data exploration has a high potential and
  many applications such as fraud detection and
  data mining will use information visualization
  technology for an improved data analysis

								
To top