Networks and Graphs IS 247 Information Visualization and Presentation 19 April 2002 James Reffell Moryma Aydelott Jean-Anne Fitzpatrick This week‘s papers • K. M. Fairchild, S. E. Poltrock, and G. W. Furnas, ―SemNet: Three-Dimensional Representations of Large Knowledge Bases‖ (1988) • S. G. Eick and G. J. Wills, ―Navigating Large Networks with Hierarchies‖ (1993) • R. A. Becker, S. G. Eick, and A. R. Wilks, ―Visualizing Network Data ―(1995) • T. Munzner, ―Exploring Large Graphs in 3D Hyperbolic Space‖ (1998) • T. Munzner, F. Guimbretière, and G. Robertson, ―Constellation: A Visualization Tool for Linguistic Queries from MindNet‖ (1999) Problem Statement ―Visual representations of generalized graphs of even modest size tend to look like a ball of tangled string. While the indicated relationships may be logically correct, they may also be visually impenetrable.‖ (From Readings in Information Visualization: Using Vision to Think , Introduction to Section 2.5) Key Goals • Represent complexity of graphs, including cross-links, cycles • Scale to large networks (thousands of nodes, millions of edges) • Provide interaction and navigation that assists in exploration • Position and represent nodes in a way that is clear and conveys information • Display links in a way that is clear and conveys information Key Goals, Cont‘d Note that the last two goals each include a negative side: – Avoid occlusion, avoid edge crossing, avoid overwhelming quantity of edges, etc., so that visualization is clear and a positive side: – Use positioning, retinal qualities, and selection/filtering to actually add to the information conveyed Rules are made to be broken? Purposes / Uses • Various applications: – Software architecture – Web architecture – Semantic relationship data – Physical network data (Internet traffic) – Large social networks – Email, telephone (Social and physical networks!) – The ―old reliable‖ data set: the Unix file system – More? Techniques • Filtering • Clustering • Aggregation • Focus + context • Semantic zooming • Selection of organizing structure (geographic map, spanning tree, semantic structure) • Selection of representation (color, size, glyphs) Typically, these techniques reduce the amount of detail displayed simultaneously; interaction is key. Important point! • Fairchild notes that visualizations of large networks are constrained by the limitations of hardware responsiveness and human perceptual capabilities: – By Moore‘s law, the former can be expected to (and has) become less of an issue – The latter will not! SemNet// Fairchild et al. Fairchild, Poltrock & Furnas, SemNet: Three-Dimensional Graphic Representations of Large Knowledge Bases (1988) – Early work, visualization of large knowledge base – General overview of problems & solutions for large network visualizations – Directed graphs in three dimensional space – Labeled rectangles represent groups of rules expressed as prolog modules. These are connected by colored arcs representing possible paths between the groups of rules. Messages are shown as labeled objects that move from group to group via the arcs. SemNet// Fairchild et al. Positioning Elements - One purpose of a network visualization is to reveal structure: ―the details … are de-emphasized and the structure is emphasized‖ - Large number of arc (edge) crossings a problem—3D helps, but layout is an issue - Poor positioning can confuse, hide structure instead of revealing it - No general solution, can be dependent on domain and other factors - Three tactics: mapping, connectivity, manipulation SemNet// Fairchild et al. Positioning Elements (cont‘d) - Mapping: Using domain-specific properties to arrange layout. - Examples: Geography, dimensions of mammals (size, predacity, domesticity) - Connectivity: Using structural properties to arrange layout. - Elements that are more directly connected to other elements should be displayed as closer together. - Perfection is not possible (b/c of dimension limitations) - Technique: Multidimensional scaling - Technique: Heuristics - Initial conditions—sometimes random - Manipulation: User-controlled layout. - Can be used in combination with other methods SemNet// Fairchild et al. Coping with too much information – One method: Show subsets by type / property – Another: Fisheye Views! Approaches include: • Clustering: Group elements together. Elements closer to the focal point in small sets, elements farther away in progressively larger sets. Clusters are represented as rectangles. – SemNet adds functions to assist with understanding clusters: color coding of rectangles to indicate adjacency to focal point, naming by most- connected node, representation of proportion of total nodes included in each set – Note: this approach is similar to semantic zooming—and they suggest this as a possibility for extension! • 3D Point Perspective: Implicit in 3D view—nearer objects appear larger than those farther away. • Sampling Density (not used): Focal point has higher resolution, distant nodes fade into nothingness – Arcs! Arcs are only shown when an element in connects is visible. SemNet// Fairchild et al. Navigation & Browsing – Recognition: Where am I? • Allow users to leave ‗markers‘ • Path retracing • Consistency with 3D metaphor is important • Depth—oscillation distracting, small random movement OK – Control & Movement: How do I move around? • Relative: Movement along 3 rotations plus forward and backward movement. Difficult to control and very disorienting. • Absolute: Movement using separate (2D each) maps of the space. Fine control difficult, and relationship between the dimensions • Teleportation: Move directly to a location—normally one already visited (so similar effect to path retracing) • Hyperspace: Movement by semantic relationship. Links! • Moving the space: Rotate and move the structure rather than the viewpoint. SemNet// Fairchild et al. Other issues: • Dynamic execution – Labeled sprites representing pieces of words move along arcs. – User controls speed – Color changes of arcs and elements indicate status (used / unused) • Application use – This application is for debugging—so is domain specific, and includes tools for doing so. Bell Labs Papers: Eick and Wills // Becker, Eick and Wilks S. G. Eick and G. J. Wills, Navigating Large Networks with Hierarchies (1993) • Visualizations of e-mail and software engineering data • Data used has no a direct spatial layouts R. A. Becker, S. G. Eick, A. R. Wilks, Visualizing Network Data (1995) • Multiple visualizations of long distance network, internet traffic, and e-mail data. • Data often has a direct spatial layout. (Both emphasize user control/ manipulation of display) Bell Labs Papers: Eick and Wills // Becker, Eick and Wilks ― Since the needs of each user are unique, our visualizations are task-oriented. Our most successful visualizations help frame interesting questions as well as answer them. Our visualizations: • Make use of existing data…. • Focus on real problems with targeted users…. • Leverage interaction. … Dynamic interaction allows users to separate the wheat from the chaff. • Are information dense. … • Focus on understanding and insight. Results are more important than any particular technique.‖ (from http://www.acm.org/sigchi/chi95/Electronic/documnts/demos/bsj_bdy.htm) Navigating Large Networks with Hierarchies (Eick and Wills) HierNet • Used to examine large hierarchal networks of 500 – millions of nodes (they claim). • Nodes placed to show relationships (not to convey geographic data or use all screen space). • Node area and color show size and function ; Links show which relationships exist and how ―hot‖ they are. • User controls time period viewed, the ―spread‖ of links colors; can change view to hide/ display chosen portions or focus on selected regions. Navigating Large Networks with Hierarchies (Eick and Wills) Well? • Placement algorithm showed both expected and surprising groupings • Using hierarchal data for scalable aggregation allowed users to reduce the size of and actually interact with the data set without reducing content (they claim) • Zooming allows users to drill down the hierarchy (from module to file views) • Use of linking (for link strengths) and mouseovers (for labels) adds to usability. Navigating Large Networks with Hierarchies (Eick and Wills) What‘s different (and actually works): • Not using geography to map data with geographic components (their rationale: since geography is known and unchanging why bother showing it) • Not mapping data to give max distance between nodes (their rationale: more chance for overlap but should increase significance of distance between nodes) • Not trying to show everything - heavy use of aggregation (reducing 8 million links to a much smaller set, then displaying only the top 1%) reduced screen clutter yet still allowed users to see important data. But who knows if something valuable was left out? Navigating Large Networks with Hierarchies (Eick and Wills) What‘s different (and not so effective): • Confusing color mappings (using red for both implies a connection between clerical nodes and ―hot‖ links) • Redundant node messages (box shape and color both say the same thing) • Filtering may result in nodes without links, not considered a problem in the paper but may be confusing for a user • Aesthetic considerations – color combinations are a bit jarring Visualizing Network Data (Becker, Eick, Wilks) SeeNet System - • A walk through varied representations of the same data: AT&T Long Distance Network calls to/ from the Bay Area after the October 17, 1989 Loma Prieta Earthquake Other Applications of the SeeNet System • CICNet, E-mail communications, World Internet data Visualizing Network Data (Becker, Eick, Wilks) Similarities to previous system • Similar uses of color and node proportion • User can interact with and manipulate display features With some differences… • (In many cases) nodes placed to reflect their geographic/ spatial relationships • Multiple types of views available Visualizing Network Data (Becker, Eick, Wilks) Representations Used: • Bi-directional values shown in one line. • Box width/ height to show numbers of inbound/ outbound calls • Link colors and width reflect statistical data • Matrix position (approximately) reflects geographic position • Matrix box size (if small) and color (if not blue) represent call load Visualizing Network Data (Becker, Eick, Wilks) Pro Con • Shows network connectivity • Cross country lines obscure middle of the country data • Color and line thickness compactly convey statistical data • Longer lines are ―given undue visual prominence‖ • Half-lines aren‘t so bad when you know where at least one half goes • When they could be to/ from anywhere, half-lines are hard to follow • (and NJ/ NY are in the Atlantic!) Visualizing Network Data (Becker, Eick, Wilks) Node Map – Pro Matrix – Pro • Display is uncluttered • All links given same visual importance • Easy to see patterns Node Map – Con • Aggregation again – only overall Matrix – Con node data available • ―Ambiguity of row and column order‖ - • Lots of empty space that‘s wasted? could maybe get a feel for location, but have to select a data point to really know what nodes are involved. Visualizing Network Data (Becker, Eick, Wilks) Parameter Focusing Statistics (using logs or percentages or … ), Levels (selecting data to show or suppress), Geography (zooming in or panning out), Time (selecting a time period), Size (changing overall size of symbols or link length/ width), and Color (using color slider to maximize/ highlight differences ) Direct Manipulation Identification (mousing over a node or link to see the data behind it), Link Map Parameter Controls (dynamic manipulation of link color, width and length), Matrix Display Parameter Controls (dragging and dropping rows and columns), Node Map Parameter Controls (dynamic manipulation of node color and size), Animation (controlling speed), Zooming (changing focus, filter by what‘s in the view), Conditioning (viewing two variables at once), and Sound (to convey state, frame and selection changes) Other SeeNet Applications Adaptability of the system demonstrated by the variety of data it can effectively display • (Long Distance Network Load) • Internet Packet Flows • E-mail Communication • Country-Country Internet Traffic Further Work (Though the image below is from: Kenneth C. Cox, Stephen G. Eick, and Taosong He, 3D geographic network displays, ACM Sigmod Record, 25(4), 50-54, December 1996) What: Internet traffic flows between fifty countries, as measured by the NSFNET backbone in 1993. Effectiveness: • Differing heights has the effect of making the most important (high traffic) links, the highest and therefore most visually prominent on the map. • Overlapping lines and the flat orientation of the map make it difficult to pick out what arcs go from where to where. Many more interactive infovis systems from AT&T Bell Laboratories : (An overview available at http://www.acm.org/sigchi/chi95/Electronic/documnts/demos/bsj_bdy.htm) • SeeData relational data • SeeDiff file system differences • SeeLib bibliographic databases • NicheWorks abstractnetworks • SeeLog time-stamped log reports • SeeNet linked geographic data • SeeSlice program slices and codecoverage • SeeSoft lines of text in files • SeeSys hierarchical software modules • SeeTree hierarchical data H3 / H3Viewer (Munzner) A ―second-generation hyperbolic cone tree‖ • Similar concept to Lamping and Rao‘s hyperbolic browser (seen last week), but projects onto sphere rather than circle, handles graphs as well as trees • Also draws on ConeTree (seen several times earlier in semester), but distributes child nodes on surface of hemisphere rather than circumference of circle • Basis for 3DXML application (paper by Risden et al. discussed in week on empirical evaluation) H3 / H3Viewer • Spanning tree used as backbone for layout – A tree that contains all nodes (but only a subset of the edges) of the graph • Crucial point: Choosing the right spanning tree requires domain-specific knowledge. – ―The key idea is that many non-tree graphs exist for which the right spanning tree can provide a useful mental model of the entire structure‖ – Examples: Directory structure as spanning tree for web architecture, total execution time data to select spanning tree for function calls H3 / H3Viewer • Benefits of hyperbolic projection as noted before: large amount of information can be displayed, distant objects are automatically reduced to < 1 pixel, but even distant trees (when visible) can be perceived as dense or sparse. • Various techniques to preserve orientation – Animation – Rotate to maintain orientation with ancestors on left, descendants on right – Child node with the most descendants always at ―pole‖ of hemisphere • Static or dynamic attributes of nodes and edges can be coded with color, line-width from http://graphics.stanford.edu/papers/h3cga/ from http://graphics.stanford.edu/papers/h3cga/ H3 / H3Viewer • Described as providing ―reasonable balance between information density and clutter‖ • Interactivity supports exploration – Also implies each view need not be ―polished‖ in part because user can adjust it, e.g, if lines intersect or nodes occlude • ―Graph as index‖: combine with other views to get all details – Example: Site Manager from http://graphics.stanford.edu/papers/h3draw/ Constellations (Munzner, Guimbretière, and Robertson) • Very different graph layout • Special purpose, highly complex subject domain! (semantic networks) • Interesting interface features: – hovering for ―light-weight query‖ – pie menu for selection of constellation to highlight Constellations • Data is calculated ―best paths‖ between two words: – Words actually on the path linking the start and end term – Words whose definitions were used in constructing the path – Different relationships between words are encoded (e.g., part of, is a, has a) • Paths are ordered by plausibility (before being graphed) from http://graphics.stanford.edu/papers/munzner_thesis/html/node10.html Constellations • Layout uses left-to-right orientation to encode plausibility (vs. more typical use of layout to show clustering / distance between nodes) • ―Plausibility gradient‖ is also represented by size • Does not attempt to minimize edge crossings; instead, use visual properties such that most edges remain in background except when part of highlighted ―constellation‖ of nodes and edges • Video makes it clearer! (I hope) from http://graphics.stanford.edu/papers/munzner_thesis/html/node10.html from http://graphics.stanford.edu/papers/munzner_thesis/html/node10.html from http://graphics.stanford.edu/papers/munzner_thesis/html/node10.html from http://graphics.stanford.edu/papers/munzner_thesis/html/node10.html from http://graphics.stanford.edu/papers/munzner_thesis/html/node10.html from http://graphics.stanford.edu/papers/munzner_thesis/html/node10.html Constellations • Illustrates: – Rules were made to be broken – Intersection of mental model and data model User Testing? • Fairchild: Evaluation based on user study, but data not included • Bell Labs: mention users but not testing • Munzner et al.: – Previous study on XML3D – Constellations used user-centered design process, no formal testing because only 3 users!