Jay Zeng Abstract
University of Washington
Mary Gates Hall, Ste 370 Information security is vital to the success of every
Seattle, WA 98195 organization. The explosive growth of the Internet has
driven the increasing growth for network size and
complexity, building a secured computing infrastructure
is an escalating challenge. Organizations are making
efforts to build trustworthy systems. The main strategy
is to actively monitor network activities to identify
successful and unsuccessful attacks or abuses. This
typically involves a wide variety of security solutions
and security appliance devices including firewall, anti-
virus, VPN, and intrusion detection system. While all
these devices have different functions and mechanisms,
they have one important thing in common: they
produce large and highly-detailed security data. The
number of these devices grows as the organization
infrastructure grows. A large organization network
infrastructure could contain huge number of these
devices in the network, the size of security data to be
analyzed quickly become overwhelming. Security
visualization is one of the effective ways to comprehend
large amounts of data. The challenge is to develop
visual representations and a user interface that both
attain and preserve a high-level contextual awareness
while investigating an event’s low-level details.
Copyright is held by the author/owner(s). Security visualization, information security, user-
CHI 2009, April 4 – 9, 2009, Boston, MA, USA centered design
ACM Classification Keywords what they’re looking for in the data. However, the tools
H4.m. Information Systems Applications: become slow and cumbersome for analysts to explore
Miscellaneous. less structured data to discover patterns and anomalies
Although security data contain some pertinent In order to overcome these limitations, I designed a
information, security analysts usually have difficulties web application that can give analysts an overview of
on determining the alert’s severity and accuracy by both the comprehensive view and individual log details.
looking at the alert itself, because information comes By integrating all views into a single user interface and
from myriad sources includes network devices, utilizing a number of web services to automate some of
applications, and other components. Relevant the common tasks to build up contextual information,
contextual information within the network needs to be this tool automates the processes of discovering,
collected for constructing relevant contextual analyzing, and visualizing anomalous and potential
understandings. Whether an analysis is data rich (with attacks.
sufficient security data) or data poor (with minimal or
no security data), analyzing network security events is Approach
a complex and time consuming task [Ref.1]. Contextual I began my work by surveying security analysts to
data generally comes from network packets. My survey determine how security analysts build up contextual
and research shows most analysts most often use information. The survey was conducted online,
textual and tabular tools such as WireShark distributed to two security mailing lists and was live for
(http://www.wireshark.org/) or Tcpdump one week. The survey consists of about 20 questions
(http://www.tcpdump.org/), these tools focus on that ask participants to provide their backgrounds, tool
obtaining detailed information from network packets. choices, and ways to retrieving detailed information
However, such tools lack a mechanism for providing a and what particular information they pay attention to
comprehensive visual view of the data. To effectively base upon contexts. Due some survey questions ask for
build up the contextual information, security analysts information that could be important or sensitive to their
have to use multiple consoles – one window to look at organizations, the survey offers participants anonymity.
the network packets, another for server logs and still It is virtually impossible to guarantee that participants
another for security alerts. When security analysts try are answering questions truthfully. Thus, this survey
to understand the details of these packets within the produces data that needs to be taken with a certain
larger context of surrounding network activities, grain of salt. There are however a number of principles
overwhelming amount of information from different that I used to apply to increase the general quality of
security devices reduce analysts’ productivity and the data and to increase readers confidence in the
increase their already considerable cognitive load results.
[Ref.2]. Additionally, these tools excel at filtering and
searching for details – but only if analysts know exactly
Open availability of survey results. Raw survey Methodology:
result is open to the community, allowing any
one to draw their own conclusions or take issue The goal of the survey was to measure as
with the findings. accurately as possible how analysts build up contextual
information on network. While I recognize the inherent
Transparency of process. The status of the
limitations on Web-based surveys, the goal of the
project and related reports, tools are always
survey was to collect data that can be used to identify
publicly available on the project website.
limitations of popular tools. The survey was created
with the principles that:
Open participation and independence.
Organizations that demonstrate a willingness
The process should be transparent for all
and ability to volunteer their time are welcome
participants and the community at large.
to participate. The project is purely voluntary
and is entirely used for educational purposes. The questions should be formulated
neutrally so as not to affect results
A total of valid 38 participants completed the survey
questionnaire. The participants have different roles All responses should be anonymous.
within their organizations, with a good representative
split between security vendors (47%), enterprise All raw survey results should be made
professionals (34%) and independent researcher available to the community.
(19%). The breakdown by size is as follows:
To allow respondents to candidly describe their
Figure 1: Number of Employees organizations, no identifiable information was collected,
including organization names, URL and IP address.
0 20 40 60
Results: log, firewall log, which varies with the
organization computing infrastructure.
• Collect relevant network packets that
provide detailed information about
user activities, including source IP
Microsoft address, destination IP address, and
• Vulnerabilities that are used against
52% the network.
• Malicious activities’ geographical
15% locations, this is particularly important
to analysts when the system is being
or has been comprised.
Figure 2: Technology Platforms Potential Causes for Inaccurate Results:
Over half of the organizations facilitate with The survey respondents are mostly from
Microsoft technologies and about 30% deploys security research and consultancy
Open source technologies, the rest utilizes both organizations. As a result, the surveyed
platforms. Interestingly, over 90% security organizations and individuals do not form a
analysts use textual and tabular tools Tcpdump complete random group.
and Wireshark regardless what technology they
facilitate. Some other notable tools are Angus, The relatively small number of valid
Microsoft network monitor, and commercial response (38) makes statistical modeling
applications [Ref.3]. and correlations difficult.
When ask about what contextual information Different understanding of what consider
are necessary, participants mark the following as “contextual information” could also
areas as the starting point: influence the final result.
• Log files from deployed services such
as Web server, operating system event
Outcome This application provides both high level overview and
Base upon the survey results, a web application is detail view within the same user interface by
developed to over the identified problems. It is chosen categorizing them into different tabs.
because of the following reasons:
When presenting detail information, in order to reduce
• Most organizations have a website analyst’s cognitive load, this application inherits the
which requires firewall or other view from textual and tabular applications. It displays
security systems to grant access to the all fine-grained log entries in a table with enhanced
HTTP protocol. Deploying a web ability, they include:
application does not violate firewall
rules and organization policies. • Highlight the selected row: analysts
often lose attentions on what entries
• A web browser is shipped to most they spot on. Highlighting the row
modern operating system and most helps them remain focused.
users have experience with web
browser. Users will have a relatively • Improved searching and filtering: each
lower learning curve of using a web field can be searched and/or sorted, it
application than command line tools. allows analysts to filter irrelevant data
and sort them in the way they desire.
• A web application provides a This is particularly useful when
standardized access to all the users, analysts know exactly what they are
this eases the process of transmitting looking for.
information such as network packets,
log file or report across different • Quick statistics: Next right to the
departments within the organizations. table, a list of (top 10) most daily
visited IP address and a pie chart with
Due to limited time and technical constraint, current IP distribution are provided. Analysts
implementation of the application takes a scaled-down often want to know how many IP
approach by focusing on a web server log. The addresses have accessed their
underlying design principles and work flows are also networks periodically, showing it next
effective to other types of information such as system to the log entry table save analysts’
log, network packets. time on running commands or queries
to calculate this statistics.
On one hand, security analysts desire very detailed,
technical information with capability to extract specific Besides having access to the detailed information, the
information. On the other hand, they want a other view provides visualizations from the tabular data
comprehensive overview of this amount of information. and some automated tasks. This view contains a
number of components that I will be introducing, Currently, the application
screen-shots for these components are in the coming automatically calculates hourly unique
section for you to reference. IP visits from the tabular data and
populates these numbers to the form.
Visual builder, implements Google Chart API [Ref.5], is As analysts will want to automate
an improved version of traditional form based different tasks base upon the contexts,
visualization generator. It offers users with high this simply serve as an example so
degrees of customizations to produce desirable analysts can extent and customize it in
visualizations. Currently, it supports various types of the way they want.
charts such as Line chart, Sparkline, Radar chart, 2D
Pie chart. Visual builder has the following advantages: As discovered in the survey result, analysts also pay
particular attention to attacker’s geographical location.
• Highly customizable: analysts have Therefore, this view integrates Google map and
almost complete control of the chart to populate all attackers’ IP addresses from the log file.
be generated, they can select This involves three technical questions:
appropriate chart type and even be
able to specify meta-properties like 1. How to identify an attacker’s IP address? What
title text, title color, background color, are the criteria of marking an IP address as
width, height and labels to be malicious?
displayed on the chart.
I wrote an RSS feed parser that parses through
• Easy to save and transmit generated the vulnerability feeds from the National
visualization: because of the nature of Vulnerability Database and stores into a local
Google Chart API, all generated image database. Meanwhile, this application uses a
are simply URLs that can be accessed signature file from Snort (an intrusion
by whom know the URL. This becomes detection system) as the basis with slight
useful when analysts want to share modifications to transform the signatures into
current network status as they no the application. The application maintains
longer need to manually generate the associations between signatures and
image with their favorite tools and vulnerabilities. It starts with scanning the log
send to other analysts. entries to detect whether each entry contains
string in the signature, the IP address is
• Automatically translate the “quick considered as malicious when its associated log
statistics” to the form: While the entry contains strings in the signature.
builder is designed to be as flexible as
possible, it will be convenient to 2. How to effectively translate IP address into the
automate some of the common tasks. geographical representations?
The challenge is that Google map requires whether the attacker come from a specific area or
longitude and latitude to display a point on its evenly distributed over the world.
map, Google map however does not provide
any native method on translating IP address What vulnerabilities have been used against the
into geographical location and there is very few network? Are these attacks successful? Analysts
free databases dedicated for IP to geographical indicate their strong interest and importance of
locations (longitude and latitude). This leads to knowing this information during the survey. Knowing
the initial implementation to be unreliable and these help analysts understand the security, health of
inaccurate due to the small size of IP database. their network, and gather relevant information to
With later implementation on a 3rd party web compile status report. As mentioned previously, the
service that actively collects IP address with application maintains a mapping between the signature
associated geographical location, it significantly file and vulnerability to determine whether the activity
improves the accuracy of retrieving IPs’ is malicious, the application displays the list of used
geographical locations. vulnerabilities that are used against the network. This
list contains CVE (common vulnerabilities and
3. What is the appropriate number of points to Exposures) number defined by the National Institute of
display on the map? Standards and Technology and technical details about
the vulnerability. Since other components such as
Showing too many points on the map makes visual builder, Google map for attacker IP are also
the map unusable and everyone has his own presented in this view, it is important to minimize the
preference. For the time being, it only shows amount of the information being showed in this view.
the top 5 IP addresses that attack the network Therefore, a user interface pattern called Accordion is
most and automatically adjusts the zoom level used. Essentially, it shows only the selected entry and
according to the IP distributions. In the future, collapse the rest of the entries in the list, which reduces
I would love to allow users to enter their the cognitive load of the analysts.
desired number of points to be displayed over
the map. Future Work
Unfortunately, due to my scope, taking into account
Showing attacker’s geographical location provides a time and technical limitations, there are many things
quick view of perceiving where the attackers physically which can still be improved. I had planned to integrate
locate and get a sense of attacker location distribution, real-world systems like Firewall, Intrusion Detection
analysts benefit from this information to quickly tell System.
Figure 3: All log entries are presented in a tabular format, with enchanced
functions like searching, filtering and highlighting. Figures on the left shows quick
statistics like visited IP address and its distribution.
Figure 4: visual builder and mapping service to display IP’s geographical location,
visual builder allows analysts to produce very customizable visualizations by
automatically populating data from the source.
Figure 5: This is a generated visualization that shows the number of attacks over
Figure 6: The accordiion view to show deteced vulnerabilities with CVE (Common
Vulnerabilities and Exposures) number, this view shows selected vulnerability with
its details per time, aims to reduce amount of the information presented to the
Acknowledgements  Greg Conti, Security Data Visualization: Graphical
I would like to thank our Capstone advisors, Dave Techniques for Network Analysis. Oct 2007, No Starch
Hendry, Batya Friedman and Barbara Endicott Press.
Popovsky. Thank you to Jeremiah Grossman for  Whitehat security, Website Security Statistics
providing useful statistical information. Report, 7th edition.
 Verizon Business RISK Team. The 2009 Data  Raffael Marty. Applied Security Visualization, Aug
Breach Investigations Report. 15 Apr. 2009 2008, Addision Wesley.
<http://www.verizonbusiness.com/resources/security/r  Google Chart API.
 National Vulnerability Database.