Docstoc

Queries

Document Sample
Queries Powered By Docstoc
					IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,                  VOL. 20,      NO. X,    XXX 2009                                                         1




                                        Firewall Policy Queries
                        Alex X. Liu, Member, IEEE, and Mohamed G. Gouda, Member, IEEE

       Abstract—Firewalls are crucial elements in network security, and have been widely deployed in most businesses and institutions for
       securing private networks. The function of a firewall is to examine each incoming and outgoing packet and decide whether to accept or
       to discard the packet based on its policy. Due to the lack of tools for analyzing firewall policies, most firewalls on the Internet have been
       plagued with policy errors. A firewall policy error either creates security holes that will allow malicious traffic to sneak into a private
       network or blocks legitimate traffic and disrupts normal business processes, which in turn could lead to irreparable, if not tragic,
       consequences. Because a firewall may have a large number of rules and the rules often conflict, understanding and analyzing the
       function of a firewall has been known to be notoriously difficult. An effective way to assist firewall administrators to understand and
       analyze the function of their firewalls is by issuing queries. An example of a firewall query is “Which computers in the private network
       can receive packets from a known malicious host in the outside Internet?” Two problems need to be solved in order to make firewall
       queries practically useful: how to describe a firewall query and how to process a firewall query. In this paper, we first introduce a simple
       and effective SQL-like query language, called the Structured Firewall Query Language (SFQL), for describing firewall queries. Second,
       we give a theorem, called the Firewall Query Theorem, as the foundation for developing firewall query processing algorithms. Third, we
       present an efficient firewall query processing algorithm, which uses decision diagrams as its core data structure. Fourth, we propose
       methods for optimizing firewall query results. Finally, we present methods for performing the union, intersect, and minus operations on
       firewall query results. Our experimental results show that our firewall query processing algorithm is very efficient: it takes less than
       10 milliseconds to process a query over a firewall that has up to 10,000 rules.

       Index Terms—Network security, firewall queries, firewall testing, firewall correctness.

                                                                                 Ç

1    INTRODUCTION
                                                                                     (NIC) on which a packet arrives.1 For the sake of brevity,
S   ERVING   as the first line of defense against malicious
     attacks and unauthorized traffic, firewalls are crucial
elements in securing the private networks of most busi-
                                                                                     we assume that each packet has a field that contains the
                                                                                     identification of the network interface on which a packet
nesses, institutions, and even home networks. A firewall is                          arrives. The hdecisioni of a rule can be accept, or
placed at the point of entry between a private network and                           discard, or a combination of these decisions with other
the outside Internet so that all incoming and outgoing                               options such as the logging option. For simplicity,
packets have to pass through it. A packet can be viewed as a                         we assume that the hdecisioni in a rule is either accept
tuple with a finite number of fields; examples of these fields                       or discard.
are source/destination IP address, source/destination port                              A packet matches a rule if and only if the packet satisfies
number, and protocol type. A firewall maps each incoming                             the predicate of the rule. The predicate of the last rule in a
and outgoing packet to a decision according to its policy                            firewall is usually a tautology to ensure that every
(i.e., configuration). A firewall policy defines which packets                       packet has at least one matching rule in the firewall. The
are legitimate and which are illegitimate by a sequence of                           rules in a firewall often conflict. Two rules in a firewall
rules. Each rule in a firewall policy is of the form                                 conflict if and only if they overlap and they have different
                                                                                     decisions. Two rules in a firewall overlap if and only if there
                      hpredicatei ! hdecisioni:                                      is at least one packet that can match both rules. Due to
                                                                                     conflicts among rules, a packet may match more than one
    The hpredicatei in a rule is a boolean expression                                rule in a firewall, and the rules that a packet matches may
                                                                                     have different decisions. To resolve conflicts among rules,
over some packet fields and the network interface card
                                                                                     for each incoming or outgoing packet, a firewall maps it to
                                                                                     the decision of the first (i.e., highest priority) rule that the
. A.X. Liu is with the Department of Computer Science and Engineering,               packet matches. Note that two overlapping rules with
  Michigan State University, East Lansing, MI, 48824.                                different decisions syntactically conflict but semantically do
  E-mail: alexliu@cse.msu.edu.
. M.G. Gouda is with the Department of Computer Sciences, The University             not conflict because of the first-match semantics. In this
  of Texas at Austin, 1 University Station (C0500), Austin, TX 78712-0233.           paper, the definition of “conflict” among firewall rules is
  E-mail: gouda@cs.utexas.edu.                                                       based on the syntax of rules.
Manuscript received 1 Apr. 2008; revised 6 Oct. 2008; accepted 8 Dec. 2008;
published online 22 Dec. 2008.                                                           1. Note that most firewall vendors (such as Cisco [10]) allow
Recommended for acceptance by M. Singhal.                                            administrators to specify NIC information in rules. For firewall products
For information on obtaining reprints of this article, please send e-mail to:        (such as Check Point firewalls [9]) that do not provide this functionality, we
tpds@computer.org, and reference IEEECS Log Number                                   can use the Rule Assignment and Direction Setting (RADIS) algorithm in
TPDS-2008-04-0124.                                                                   [5], [6] to automatically assign the NIC information to each rule. Wool gave
Digital Object Identifier no. 10.1109/TPDS.2008.263.                                 a comprehensive discussion of the NIC issue in firewall rules in [44].
                                               1045-9219/09/$25.00 ß 2009 IEEE       Published by the IEEE Computer Society
2                                                        IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,           VOL. 20,   NO. X,   XXX 2009


    The function (i.e., behavior) of a firewall is specified in its              a firewall satisfies certain conditions is a part of daily
policy, which consists of a sequence of rules. The policy of a                   maintenance activity. For example, if the administrator
firewall is the most important component in achieving the                        detects that a computer in a private network is under attack,
security and functionality of the firewall [41]. However,                        the firewall administrator can issue queries to check which
most firewalls on the Internet are poorly configured, as                         other computers in the private network are also vulnerable
witnessed by the success of worms2 and viruses like Blaster                      to the same type of attacks. In the process of designing a
[8] and Sapphire [11], which could be easily blocked by a                        firewall, the designer can issue some firewall queries to
well-configured firewall [43]. It has been observed that most                    detect design errors by checking whether the answers to the
firewall security breaches are caused by configuration                           queries are consistent with the firewall specification.
errors [7]. An error in a firewall policy means that some                           To make firewall queries practically useful, two pro-
illegitimate packets are identified as being legitimate, or                      blems need to be solved: how to describe a firewall query
some legitimate packets are identified as being illegitimate.                    and how to process a firewall query. The second problem is
Such a policy error either creates security holes that will                      especially difficult. Recall that the rules in a firewall are
allow malicious traffic to sneak into a private network or                       sensitive to the rule order and the rules often conflict. The
blocks legitimate traffic and disrupts normal business                           naive solution is to enumerate every packet specified by a
processes, which in turn could lead to irreparable, if not                       query and check the decision for each packet. Clearly, this
tragic, consequences. Clearly, a firewall policy should be                       solution is infeasible. For example, to process the query
well understood and analyzed before being deployed.                              “which computers in the outside Internet cannot send any
    However, due to the large number of rules in a firewall                      packet to the private network,” this naive solution needs to
and the large number of conflicts among rules, under-                            enumerate 288 possible packets and check the decision of
standing and analyzing the function of a firewall has been                       the firewall for each packet, which is infeasible. Note that
known to be notoriously difficult [34], [35]. The implication                    firewall queries are inherently different from relational
of any rule in a firewall cannot be understood without
                                                                                 database queries. In a relational database, each field of a
examining all the rules listed above that rule. There are
                                                                                 tuple has a fixed value. Although each rule in a firewall
other factors that contribute to the difficulties in under-
                                                                                 seems analogous to a tuple in a database, each field of a rule
standing and analyzing firewalls. For example, a corporate
                                                                                 is a range, not a fixed value.
firewall often consists of rules that are written by different
                                                                                    In this paper, we present solutions to both problems.
administrators at different times and for different reasons. It
                                                                                 First, we introduce a simple and effective SQL-like query
is difficult for new firewall administrators to understand
                                                                                 language, called the Structured Firewall Query Language
the implication of each rule that they have not written.
                                                                                 (SFQL), for describing firewall queries. This language uses
    An effective way to assist administrators to understand
                                                                                 queries of the form “select . . . from . . . where . . . .” Second,
and analyze firewalls is by issuing firewall queries.
                                                                                 we present a theorem, called the Firewall Query Theorem,
Firewall queries are questions concerning the function of
a firewall. An example firewall query is “Which computers                        as the foundation for developing firewall query processing
in the private network can receive BOOTP3 packets from                           algorithms. Third, we present an efficient query processing
the outside Internet?” Figuring out answers to such                              algorithm that uses firewall decision diagrams as its core
firewall queries is of tremendous help for a firewall                            data structure. For a given firewall of a sequence of rules,
administrator to understand and analyze the function of                          we first construct an equivalent firewall decision diagram
the firewall. For example, assuming the specification of a                       using a construction algorithm. Then, the firewall decision
firewall requires that all computers in the outside Internet,                    diagram is used as the core data structure of this query
except a known malicious host, are able to send e-mails to                       processing algorithm for answering each firewall query.
the mail server in the private network; a firewall admin-                        Fourth, we propose methods for optimizing firewall query
istrator can test whether the firewall satisfies this require-                   results. Finally, we present methods for performing the
ment by issuing a firewall query “Which computers in the                         union, intersect, and minus operations on firewall query
outside Internet cannot send e-mails to the mail server in                       results. Our experimental results show that our firewall
the private network?” If the answer to this query contains                       query processing algorithm is very efficient: it takes less
exactly the known malicious host, then the firewall                              than 10 milliseconds to process a query over a firewall that
administrator is assured that the firewall does satisfy this                     has up to 10,000 rules.
requirement. Otherwise, the firewall administrator knows                            The rest of the paper proceeds as follows: We first review
that the firewall fails to satisfy this requirement and needs                    related work in Section 2. We then formally define our
to be reconfigured.                                                              problem and notation in Section 3. In Section 4, we present
    Firewall queries are also useful in a variety of other                       the actual syntax of the structured firewall query language
scenarios, such as firewall maintenance and firewall                             and show how to use this language to describe firewall
debugging. For a firewall administrator, checking whether                        queries. The theory foundation and algorithms on firewall
                                                                                 query processing are presented in Section 5. In Section 6, we
   2. Note that not all worms can be blocked by only examining packet            present methods for optimizing firewall query results. In
headers.
   3. The Bootp protocol is used by workstations and other devices to obtain     Section 7, we present methods for performing the union,
IP addresses and other information about the network configuration of a          intersect, and minus operations on firewall query results. In
private network. Since there is no need to offer the service outside a private
network, and it may offer useful information to hackers, usually Bootp           Section 8, we show experimental results. Finally, we give
packets are blocked from entering a private network.                             concluding remarks in Section 9.
LIU AND GOUDA: FIREWALL POLICY QUERIES                                                                                                           3


2    RELATED WORK                                                         Firewall vulnerabilities were discussed and classified in
                                                                          [45], [46]. However, the focus of [17], [28] are the vulner-
There is little work that has been done on firewall queries.
                                                                          abilities of the packet filtering software and the supporting
In the seminal work [34], [35], [42], a query-based firewall
                                                                          hardware part of a firewall, not the policy of a firewall.
analysis system named Fang was presented. In Fang, a
                                                                             There are some tools currently available for network
firewall query is described by a triple (a set of source
addresses, a set of destination addresses, a set of services),            vulnerability testing, such as Satan [16], [18] and Nessus
where each service is a tuple (protocol type, source port                 [38]. These vulnerability testing tools scan a private net-
number, and destination port number). The meaning of                      work based on the current publicly known attacks, rather
such a query is, “which IP addresses in the set of source                 than the requirement specification of a firewall. Although
addresses can use which services in the set of services to                these tools can possibly catch errors that allow illegitimate
which IP addresses in the set of destination addresses.” We               access to the private network, they cannot find the errors
make three contributions in comparison with Fang. First, in               that disable legitimate communication between the private
processing a query on a firewall, our query processing                    network and the outside Internet. Firewall policy testing
algorithm is much more efficient than Fang. In Fang, a                    was studied in [27].
query is processed by comparing the query with every rule
in a firewall in a linear fashion. In contrast, we first convert          3    FORMAL DEFINITIONS
a firewall to a tree representation and then process queries
on the tree, which is much more efficient. Second, our                    We now formally define the concepts of fields, packets,
system can describe and process firewall queries over                     firewalls, and the Firewall Compression Problem. A field Fi
discard traffic, while Fang only supports queries over                    is a variable whose domain, denoted DðFi Þ, is a finite
accept traffic. Third, we formulate firewall queries using an             interval of nonnegative integers. For example, the domain
SQL-like language.                                                        of the source address field in an IP packet is ½0; 232 À 1Š. A
    Some firewall analysis methods have been proposed in                  packet over the d fields F1 ; . . . ; Fd is a d-tuple ðp1 ; . . . ; pd Þ,
[4], [14], [15], [19], [25], [26], [29], [30], [31], [36]. In [29], Liu   where each pi (1 i d) is an element in DðFi Þ. We use Æ
presented algorithms for performing the change impact                     to denote the set of all packets over fields F1 ; . . . ; Fd . It
analysis of firewall policies. In [30], Liu presented an                  follows that Æ is a finite set and jÆj ¼ jDðF1 Þj  Á Á Á
algorithm for verifying firewall policies. The verification of            ÂjDðFd Þj, where jÆj denotes the number of elements in set
distributed firewalls is studied in [19]. In [31], Liu and                Æ and jDðFi Þj denotes the number of elements in set DðFi Þ
Gouda studied the redundancy issues in firewall policies                  for each i.
and gave an algorithm for removing all the redundant rules                    A firewall rule has the form hpredicatei ! hdecisioni. A
in a firewall policy. In [26], some ad hoc “what if”                      hpredicatei defines a set of packets over the fields F1
questions that are similar to firewall queries were dis-                  through Fd specified by the predicate F1 2 S1 ^ Á Á Á ^ Fd 2
cussed. However, no algorithm was presented for proces-                   Sd , where each Si is one nonempty interval that is a subset
sing the proposed “what if” questions. In [15], expert                    of DðFi Þ. A packet ðp1 ; . . . ; pd Þ matches a predicate F1 2
systems were proposed to analyze firewall rules. Clearly,                 S1 ^ Á Á Á ^ Fd 2 Sd and the corresponding rule, if and only if
building an expert system just for analyzing a firewall is                the condition p1 2 S1 ^ Á Á Á ^ pd 2 Sd holds. We use  to
overwrought and impractical. Detecting potential firewall                 denote the set of possible values that hdecisioni can be.
policy errors by conflict detection was discussed in [4], [14],           Typical elements of  include accept, discard, accept with
[25], [36]. Similar to conflict detection, some anomalies are             logging, and discard with logging. For any i, if Si ¼ DðFi Þ,
defined and techniques for detecting anomalies are pre-                   we often use the keyword all to denote Si .
sented in [2], [47]. Examining each conflict or anomaly is                    Some existing firewall products, such as Linux’s ipchains
helpful in reducing potential firewall policy errors; how-                [1], represent source and destination IP addresses as
ever, the number of conflicts or anomalies in a firewall is               prefixes in their rules. An example of a prefix is
typically large, and manual checking of each conflict or                  192:168:0:0=16 or 192:168: Ã : Ã , both of which represent
anomaly is unreliable because the meaning of each rule                    the set of IP addresses in the range from 192.168.0.0 to
depends on the current order of the rules in the firewall,                192.168.255.255. Essentially, each prefix represents one
which may be incorrect.                                                   integer interval (as we can treat an IP address as a 32-bit
    Some firewall design methods have been proposed in [5],               integer). In this paper, we uniformly represent firewall rules
[20], [21], [22], [24], [32]. These works aim at creating                 using intervals.
firewall rules, while we aim at analyzing firewall rules.                     A firewall f over the d fields F1 ; . . . ; Fd is a sequence of
Gouda and Liu proposed to use decision diagrams for                       firewall rules. The size of f, denoted jfj, is the number of rules
designing firewalls in [20], [22]. In [32], Liu and Gouda                 in f. A sequence of rules hr1 ; . . . ; rn i is complete if and only if
applied the technique of design diversity to firewall design.             for any packet p, there is at least one rule in the sequence that p
Gouda and Liu also proposed a model for specifying                        matches. A sequence of rules needs to be complete for it to
stateful firewall policies [21]. Guttman proposed a Lisp-like             serve as a firewall. To ensure that a firewall is complete, the
language for specifying high-level packet filtering policies              predicate of the last rule in a firewall is usually specified as
in [24]. Bartal et al. proposed a UML-like language for                   F1 2 DðF1 Þ ^ Á Á Á Fd 2 ^DðFd Þ, which every packet matches.
specifying global filtering policies in [5].                                  Fig. 1 shows an example of a simple firewall. In this
    Design of high-performance ATM firewalls was dis-                     example, we assume that each packet has only two fields: S
cussed in [45], [46] with focus on firewall architectures.                (source address) and D (destination address), and both
4                                                        IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,           VOL. 20,   NO. X,   XXX 2009


                                                                               packets to the computer whose address is 6?,” can be
                                                                               formulated as the following query using SFQL:



Fig. 1. Example firewall f1 .

fields have the same domain ½1; 10Š. Let f1 be the name of                       The result of this query is f4; 5; 6; 7g.
this firewall.                                                                   As another example, a question to the firewall in Fig. 1,
    Two rules in a firewall may overlap; that is, a single                     “Which computers cannot send packets to the computer
packet may match both rules. For example, rule r1 and r2 in
                                                                               whose address is 6?,” can be formulated as the following
the firewall in Fig. 1 overlap. Furthermore, two rules in a
firewall may conflict; that is, the two rules not only overlap                 query using SFQL:
but also have different decisions. For example, rule r1 and
r2 in the firewall in Fig. 1 not only overlap but also conflict.
To resolve such conflicts, firewalls typically employ a first-
match resolution strategy where the decision for a packet p
is the decision of the first (i.e., highest priority) rule that p
matches in f. The decision that firewall f makes for packet p                    The result of this query is f3; 8g.
is denoted fðpÞ.
    We can think of a firewall f as defining a many-to-one
                                                                               4.2 Firewall Query Examples
mapping function from Æ to . Two firewalls f1 and f2 are
equivalent, denoted f1  f2 , if and only if they define the                   Next, we give some example firewall queries using SFQL. Let
same mapping function from Æ to ; that is, for any packet                     f be the name of the firewall that resides on the gateway
p 2 Æ, we have f1 ðpÞ ¼ f2 ðpÞ. For any firewall f, we use ffg                 router in Fig. 2. This gateway router has two interfaces:
to denote the set of firewalls that are semantically                           interface 0, which connects the gateway router to the outside
equivalent to f.                                                               Internet, and interface 1, which connects the gateway router
                                                                               to the inside local network. Attached to the private network
4      STRUCTURED FIREWALL QUERY LANGUAGE                                      are a mail server, and two hosts, host 1 and host 2. In these
In this section, we present the syntax of our firewall query                   examples, we assume each packet has the following five
language and show how to use this language to describe                         fields: I (Interface), S (Source IP), D (Destination IP), N
firewall queries.                                                              (Destination Port), and P (Protocol Type).
4.1 Query Language
A query, denoted Q, in our Structured Firewall Query
Language (SFQL) is of the following format:




where Fi is one of the fields F1 ; . . . ; Fd , f is a firewall, each
Sj is a nonempty subset of the domain DðFj Þ of field Fj , and
hdeci is either accept or discard.
   The result of query Q, denoted Q:result, is the following
set:

    fpi jðp1 ; . . . ; pd Þ is a packet in Æ; and;
          ðp1 2 S1 Þ ^ Á Á Á ^ ðpd 2 Sd Þ ^ ðf:ðp1 ; . . . ; pd Þ ¼ hdeciÞg:
    Recall that Æ denotes the set of all packets, and
f:ðp1 ; . . . ; pd Þ denotes the decision to which firewall f maps
the packet ðp1 ; . . . ; pd Þ.
    We can get the above set by first finding all the packets
ðp1 ; . . . ; pd Þ in Æ such that the following condition holds,

      ðp1 2 S1 Þ ^ Á Á Á ^ ðpd 2 Sd Þ ^ ðfððp1 ; . . . ; pd ÞÞ ¼ hdeciÞ;
then projecting all these packets to the field Fi .
   For example, a question to the firewall in Fig. 1, “Which
computers whose addresses are in the set ½4; 8Š can send                         4. Bootp packets are UDP packets and use port number 67 or 68.
LIU AND GOUDA: FIREWALL POLICY QUERIES




                                                                       Fig. 2. Firewall f.




                                                                       Fig. 3. Consistent firewall f2 .

                                                                       if and only if there are at least two rules in the firewall that
                                                                       conflict. Note that a firewall with conflicting rules is
                                                                       syntactically inconsistent, but semantically consistent be-
                                                                       cause of the first-match semantics. In this paper, our
                                                                       definitions of “consistent firewalls” and “inconsistent fire-
                                                                       walls” are based on the syntax of firewalls.
                                                                           Recall that two rules in a firewall conflict if and only if
                                                                       they have different decisions and there is at least one packet
                                                                       that matches both rules. For example, the first two rules in
                                                                       the firewall in Fig. 1, namely r1 and r2 , conflict. Note that for
                                                                       any two rules in a consistent firewall, if they overlap, i.e.,
                                                                       there is at least one packet that can match both rules, they
                                                                       have the same decision. So, given a packet and a consistent
                                                                       firewall, all the rules in the firewall that the packet matches
                                                                       have the same decision. Fig. 1 shows an example of an
                                                                       inconsistent firewall, and Fig. 3 shows an example of a
                                                                       consistent firewall. Note that these two firewalls are
                                                                       equivalent. In these two firewall examples, we assume that
                                                                       each packet only has two fields: S (source address) and D
                                                                       (destination address), and both fields have the same
                                                                       domain ½1; 10Š.
                                                                           Our interest in consistent firewalls is twofold. First, as
                                                                       discussed in Section 5.3, each inconsistent firewall can be
                                                                       converted to an equivalent consistent firewall. Second, as
                                                                       shown in the following theorem, it is easier to process
                                                                       queries for consistent firewalls than for inconsistent ones.
                                                                       Theorem 1. (Firewall Query Theorem). Let Q be a query of
                                                                         the following form:
                                                                               select Fi
                                                                               from f
                                                                               where ðF1 2 S1 Þ ^ Á Á Á ^ ðFd 2 Sd Þ ^ ðdecision ¼ hdeciÞ
                                                                               Also let f be a consistent firewall that consists of n rules
                                                                                                                                      0
                                                                         r1 ; . . . ; rn , where each rule rj is of the form ðF1 2 S1 Þ ^
                                                                         Á Á Á ^ ðFd 2 Sd Þ ! hdec0 i. Then,
                                                                                            0


                                                                                                              [
                                                                                                              n
5   FIREWALL QUERY PROCESSING                                                                    Q:result ¼         Q:rj ;
                                                                                                              j¼1
5.1 Theory Foundation
In this section, we discuss how to process a firewall query               where each Q:rj is      defined using rule rj as follows:
for consistent firewalls. A firewall is consistent if and only if                8
no two rules in the firewall conflict. A firewall is inconsistent                < Si \ Si0 ;                0                        0
                                                                                                   if ðS1 \ S1 6¼ ;Þ ^ Á Á Á ^ ðSd \ Sd 6¼ ;Þ
                                                                                                                   0
                                                                          Q:rj ¼                   ^ðhdeci ¼ hdec iÞ,
                                                                                 :
  5. SMTP stands for Simple Mail Transfer Protocol. SMTP packets are               ;;              otherwise:
TCP packets and use port number 25.
6                                                   IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,          VOL. 20,   NO. X,   XXX 2009

                                 S
    Proof. First, we prove n Q:rj  Q:result by proving
                                   j¼1                                   share the same prefix S 2 ½4; 7Š. Thus, if we apply the query
                                                      0
Q:rj  Q:result for each j. Let rj be ðF1 2 S1 Þ ^ Á Á Á ^ ðFd 2         processing algorithm in Algorithm 1 to answer a query, for
  0           0           0                       0
Sd Þ ! hdec i. If ðS1 \ S1 6¼ ;Þ ^ Á Á Á ^ ðSd \ Sd 6¼ ;Þ ^ ðhdeci ¼     instance, whose “where clause” contains the conjunct
hdec0 iÞ does not hold, then Q:rj ¼ ; and clearly Q:rj                  S 2 f3g, over the firewall in Fig. 3, then the algorithm will
                          0                       0
Q:result. I f ðS1 \ S1 6¼ ;Þ ^ Á Á Á ^ ðSd \ Sd 6¼ ;Þ ^ ðhdeci ¼         repeat three times the calculation of f3g \ ½4; 7Š. Clearly, if
hdec0 iÞ does hold, then Q:rj ¼ Si \ Si . Because of rule rj ,
                                             0
                                                                         we reduce the number of these repeated calculations, the
                                                     0
for any packet ðp1 ; . . . ; pd Þ that ðp1 2 S1 \ S1 Þ ^ Á Á Á ^ ðpd 2   efficiency of the firewall query processing algorithm can be
        0                                       0
Sd \ Sd Þ, we have f:ðp1 ; . . . ; pd Þ ¼ hdec i. Since hdeci ¼          greatly improved.
hdec0 i, the following set                                                  Next, we present a more efficient firewall query
                                                                         processing algorithm that has no repeated calculations
           fpi j ðp1 ; . . . ; pd Þ is a packet in Æ; and
                                    0                    0
                                                                         and can be applied to both consistent and inconsistent
                 ðp1 2 S1 \ S1 Þ ^ Á Á Á ^ ðpd 2 Sd \ Sd Þg              firewalls. The basic idea of this query processing algorithm
is a subset of Q:result. Because each Si or Si0 is a nonempty            is as follows: First, we convert the firewall (whether
set, we have                                                             consistent or inconsistent) that we want to query to an
                                                                         equivalent firewall decision diagram. Second, because the
        Q:rj ¼ Si \ Si0                                                  resulting firewall decision diagram is a consistent and
              ¼ fpi j ðp1 ; . . . ; pd Þ is a packet in Æ; and;          compact representation of the original firewall, it is used as
                                     0                     0             the core data structure for query processing. We call this
                 ðp1 2 S1 \ S1 Þ ^ Á Á Á ^ ðpd 2 Sd \ Sd Þg:
                                                                         algorithm the FDD-based firewall query processing algorithm.
    So, Q:rj  Q:result.                  S                                 Here, we give a brief introduction to firewall decision
    Second, we prove Q:result  n Q:rj . Consider a pi in
                                            j¼1                          diagrams [20]. A similar data structure was used by
Q:result. By the definition of Q:result, there is at least one           Rovniagin and Wool in [40] and by Dobkin and Lipton in [12].
packet ðp1 ; . . . ; pd Þ such that the condition ðp1 2 S1 Þ ^ Á Á Á ^
ðpd 2 Sd Þ ^ ðf:ðp1 ; . . . ; pd Þ ¼ hdeciÞ holds. Let rule r, ðF1 2     Definition 1. (Firewall Decision Diagram). A Firewall
S1 Þ ^ Á Á Á ^ ðFd 2 Sd Þ ! hdec0 i, be a rule that ðp1 ; . . . ; pd Þ
  0                       0                                                Decision Diagram (FDD) with a decision set DS and over
matches in f. Because all the rules in the consistent firewall             fields F1 ; . . . ; Fd is an acyclic and directed graph that has the
f that ðp1 ; . . . ; pd Þ matches have the same decision, we have          following five properties:
f:ðp1 ; . . . ; pd Þ ¼ hdec0 i. So hdeci ¼ hdec0 i. Because each pi is
                                                                               1.  There is exactly one node in f that has no incoming
an element of Si \ Si0 , we have ðS1 \ S1 6¼ ;Þ ^ Á Á Á ^0
                                     0                                             edges. This node is called the root of f. The nodes in f
           0
ðSd \ Sd 6¼ ;Þ ^ ðhdeci ¼ hdec iÞ. So Q:rj ¼ Si \ Si0 . Therefore,
                                 S                                                 that have no outgoing edges are called terminal nodes
pi 2 Q:rj . So Q:result  n Q:rj . j¼1                               u
                                                                     t             of f.
5.2 Rule-Based Firewall Query Processing                                       2. Each node v has a label, denoted F ðvÞ, such that
The Firewall Query Theorem implies a simple query                                        
                                                                                           fF1 ; . . . ; Fd g; if v is a nonterminal node,
processing algorithm: given a consistent firewall f that                         F ðvÞ 2
                                                                                           DS;                 if v is a terminal node.
consists of n rules S ; . . . ; rn and a query Q, compute Q:rj for
                    r1
each rule rj , then n Q:rj is the result of query Q. We call
                      j¼1
this algorithm the rule-based firewall query processing algo-                  3.   Each edge e in f has a label, denoted IðeÞ, such that if e
rithm. Algorithm 1 shows the pseudocode of this algorithm.                          is an outgoing edge of node v, then IðeÞ is a nonempty
                                                                                    subset of DðF ðvÞÞ.
Algorithm 1. Rule-Based Firewall Query Processing                              4.   A directed path in f from the root to a terminal node is
                  Algorithm                                                         called a decision path of f. No two nodes on a decision
  Input: 1 A consistent firewall f that consists of n rules:                        path have the same label.
             r1 ; . . . ; rn .                                                 5.   The set of all outgoing edges of a node v in f, denoted
          2 A query Q: select Fi from f where                                       EðvÞ, satisfies the following two conditions:
            ðF1 2 S1 Þ ^ Á Á Á ^ ðFd 2 Sd Þ ^ ðdecision ¼ hdeciÞ
                                                                                    a.   Consistency: IðeÞ \ Iðe0 Þ ¼ ; for any two distinct
  Output: Result of query Q.
                                                                                         edges e and e0 S EðvÞ,
                                                                                                        in
1 Q:result :¼ ;;
                                                                                    b.   Completeness: e2EðvÞ IðeÞ ¼ DðF ðvÞÞ.
2 for j :¼ 1 to n do
3       /*Let rj ¼ ðF1 2 S1 Þ ^ Á Á Á ^ ðFd 2 Sd Þ ! hdec0 i*/
                                0                0                          We define a full-length ordered FDD as an FDD where in
4                     0                      0
        if ðS1 \ S1 6¼ ;Þ ^ Á Á Á ^ ðSd \ Sd 6¼ ;Þ ^                     each decision path all fields appear exactly once and in the
        ðhdeci ¼ hdec0 iÞ then                                           same order. For ease of presentation, as the rest of this
5              Q:result :¼ Q:result [ ðSi \ Si0 Þ then                   paper only concerns full-length ordered FDDs, we use the
6 return Q:result;                                                       term “FDD” to mean “full-length ordered FDD” if not
                                                                         otherwise specified.
5.3  FDD-Based Firewall Query Processing                                    Fig. 4 shows an example FDD named f3 . In this example,
     Algorithm                                                           we assume that each packet has only two fields: S (source
Observe that multiple rules in a consistent firewall may                 address) and D (destination address), and both fields have
share the same prefix. For example, in the consistent                    the same domain ½1; 10Š. In the rest of this paper, including
firewall in Fig. 3, the first three rules, namely r01 ; r02 and r03      this example, we use “a” as a shorthand for accept and “d”
LIU AND GOUDA: FIREWALL POLICY QUERIES                                                                                               7


                                                                         select Fi
                                                                         from f
                                                                         where ðF1 2 S1 Þ ^ Á Á Á ^ ðFd 2 Sd Þ ^ ðdecision ¼ hdeciÞ.
                                                                         The algorithm starts by traversing the FDD from its root.
                                                                      Let Fj be the label of the root. For each outgoing edge e of
                                                                      the root, we compute IðeÞ \ Sj . If IðeÞ \ Sj ¼ ;, we skip
                                                                      edge e and do not traverse the subgraph that e points to. If
                                                                      IðeÞ \ Sj 6¼ ;, then we continue to traverse the subgraph
Fig. 4. Firewall decision diagram f3 .                                that e points to in a similar fashion. Whenever a terminal
                                                                      node is encountered, we compare the label of the terminal
as a shorthand for discard. Note that this FDD represents the         node and hdeci. If they are the same, assuming the rule
firewall in Fig. 1 and also the firewall in Fig. 3.                   defined by the decision path containing the terminal node is
    A decision path in an FDD f is represented by                     ðF1 2 S1 Þ ^ Á Á Á ^ ðFd 2 Sd Þ ! hdec0 i, then add Si \ Si0 to
                                                                              0                   0
ðv1 e1 Á Á Á vk ek vkþ1 Þ where v1 is the root, vkþ1 is a terminal    Q:result. In this pseudocode and the rest of this paper, we
node, and each ei is a directed edge from node vi to node             use e:t to denote the (target) node that the edge e points to.
viþ1 . A decision path ðv1 e1 Á Á Á vk ek vkþ1 Þ in an FDD defines
the following rule:                                                   Algorithm 2. FDD-based Firewall Query Processing Algo-
                                                                      rithm
               F1 2 S1 ^ Á Á Á ^ Fn 2 Sn ! F ðvkþ1 Þ;                   Input: (1) An FDD f.
where                                                                           (2) A query Q: select Fi from f where
            8                                                                   ðF1 2 S1 Þ ^ Á Á Á ^ ðFd 2 Sd Þ ^ ðdecision ¼ hdeciÞ
            > Iðej Þ;
            >       if there is a node vj in the decision               Output: Result of query Q.
            <
                    path that is labeled with field Fi ,              1 Q:result ¼ ;
     Si ¼
          > DðFi Þ; if no node in the decision path is
          >                                                           2 CHECKðf:root,
          :
                    labeled with field Fi .                             ðF1 2 S1 Þ ^ Á Á Á ^ ðFd 2 Sd Þ ^ ðdecision ¼ hdeciÞ;
    For an FDD f, we use Sf to denote the set of all the rules        3 return Q:result;
defined by all the decision paths of f. For any packet p,             4 CHECKðv; ðF1 2 S1 Þ ^ Á Á Á ^ ðFd 2 Sd Þ ^
there is one and only one rule in Sf that p matches because              ðdecision ¼ hdeciÞÞ
of the consistency and completeness properties; therefore, f          5 if (v is a terminal node) and (F ðvÞ ¼ hdeci) then
maps p to the decision of the only rule that p matches.               6       Let ðF1 2 S1 Þ ^ Á Á Á ^ ðFd 2 Sd Þ ! hdec0 i be the rule
                                                                                             0                  0

Considering the FDD f3 in Fig. 4, Fig. 3 shows all the six                    segment defined by the decision path containing
rules in Sf3 .                                                                node v;
    Given an FDD f, any sequence of rules that consists of all        7       Q:result :¼ Q:result [ ðSi \ Si0 Þ;
the rules in Sf is equivalent to f. The order of the rules in         8 If (v is a nonterminal node)
such a firewall is immaterial because the rules in Sf are             9       /*Let Fj be the label of v*/
nonoverlapping.                                                       10      For each edge e in EðvÞ do
    Given a sequence of rules, we can construct an                    11            If IðeÞ \ Sj 6¼ ; then
equivalent FDD using the FDD construction algorithm in                12                  CHECKðe:t; ðF1 2 S1 Þ ^ Á Á Á ^ ðFd 2 Sd Þ ^
[32]. For example, the FDD generated from the firewall in                                      ðdecision ¼ hdeciÞÞ.
Fig. 1 is shown in Fig. 4.
    The algorithm for converting an inconsistent firewall f to        5.4 Efficient FDD Reduction Using Hashing
a consistent firewall consists of two steps. First, convert f to      To further improve the efficiency of the FDD-based
an equivalent FDD f 0 by the construction algorithm in this
                                                                      firewall query processing algorithm, after we convert a
section. Second, generate a rule for each decision path in f 0 ,
                                                                      firewall to an equivalent FDD, we need to reduce the size
i.e., obtain Sf 0 ; then any sequence of rules that consists of all
                                                                      of the FDD. A full-length ordered FDD is reduced if and
the rules in Sf 0 is a consistent firewall that is equivalent to f.
                                                                      only if no two nodes are isomorphic and no two nodes
    The pseudocode of the FDD-based firewall query
                                                                      have more than one edge between them. Two nodes v and
processing algorithm is shown in Algorithm 5.3. This
algorithm has two inputs, an FDD and a query described                v0 in an FDD are isomorphic if and only if v and v0 satisfy
by SFQL. Note that Q:result is a global variable. Also note           one of the following two conditions: 1) both v and v0 are
that our assumption of using full-length ordered FDDs is              terminal nodes with identical labels; 2) both v and v0 are
only for simplifying the presentation of this paper. In               nonterminal nodes and there is a one-to-one correspon-
practice, the FDDs used for processing firewall queries do            dence between the outgoing edges of v and the outgoing
not need to be full-length or ordered. Our FDD-based                  edges of v0 such that every pair of corresponding edges
firewall query processing algorithm can be easily adapted             have identical labels and they both point to the same node.
for processing queries on FDDs that are not full-length or            Fig. 5 shows an FDD before reduction and Fig. 6 shows the
not ordered.                                                          corresponding FDD after reduction.
    This FDD-based firewall query processing algorithm                   A brute force deep comparison algorithm for FDD
works as follows. Suppose the two inputs of this algorithm            reduction was proposed in [22]. In this paper, we use a
are an FDD f and a query Q:                                           more efficient FDD reduction algorithm that processes the
8                                                     IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,              VOL. 20,   NO. X,   XXX 2009




Fig. 5. A full-length ordered FDD before reduction.                      Fig. 6. A full length ordered after reduction.

nodes level by level from the terminal nodes to the root                 firewall using the algorithm in [32]. Given a firewall with n
node using signatures to speed up comparisons [33].                      rules where each rule examines d packet fields, its
    Starting from the bottom level, at each level, we compute            equivalent consistent firewall will have Oðnd Þ rules. As
a signature for each node at that level. For a terminal node             the rule-based firewall query algorithm linearly scans every
v, set vs signature to be its label. For a nonterminal node v,           rule and performs d comparison for each rule, its complex-
suppose v has k children v1 ; v2 ; . . . ; vk , in increasing order of   ity is Oðndþ1 Þ.
signature (Sigðvi Þ < Sigðviþ1 Þ for 1 i k À 1), and the
edge between v and its child vi is labeled with Ei , a                   5.5.2 Complexity of FDD-Based Firewall Query
sequence of nonoverlapping prefixes in increasing order.                         Processing Algorithm
Set the signature of node v as follows:                                  As every nonterminal node in a reduced FDD cannot have
                                                                         more than 2n À 1 outgoing edges, finding the right outgoing
            SigðvÞ ¼ hðSigðv1 Þ; E1 ; . . . ; Sigðvk Þ; Ek Þ;            edge to traverse takes Oðlogn Þ time using binary search. Let
where h is a one-way and collision resistant hash function               k be the total number of paths that a query overlaps on an
such as MD5 [39] and SHA-1 [13]. For any such hash                       FDD, the processing time for the query is Oðkd logn Þ. Note
function h, given two different inputs x and y, the                      that k is typically small.
probability of hðxÞ ¼ hðyÞ is extremely small.
   After we have assigned signatures to all nodes at a given             6    FIREWALL QUERY POSTpROCESSING
level, we search for isomorphic subgraphs as follows: For
                                                                         To keep our presentation simple, we have described a
every pair of nodes vi and vj (1 i 6¼ j k) at this level, if
                                                                         somewhat watered-down version of the firewall query
Sigðvi Þ 6¼ Sigðvj Þ, then we can conclude that vi and vj are
                                                                         language where the “select” clause in a query has only
not isomorphic; otherwise, we explicitly determine if vi and
                                                                         one field. In fact, the “select” clause in a query can be
vj are isomorphic. If vi and vj are isomorphic, we delete
                                                                         extended to have more than one field. The results in this
node vj and its outgoing edges, and redirect all the edges
                                                                         paper, e.g., the Firewall Query Theorem and the two
that point to vj to point to vi . Further, we eliminate double
                                                                         firewall query processing algorithms, can all be extended
edges between node vi and its parents.
                                                                         in a straightforward manner to accommodate the extended
   For example, the signatures of the non-root nodes in
                                                                         “select” clauses.
Fig. 5 are computed as follows:
                                                                            However, when the “select” clause in a query has more
        Sigðv4 Þ ¼ a;                                                    than one field, the query result may contain many disjoint
        Sigðv5 Þ ¼ d;                                                    multidimensional predicates. For example, consider the
                                                                         following query on the firewall in Fig. 1, the FDD of which
        Sigðv2 Þ ¼ hðSigðv4 Þ; ½4; 8Š; Sigðv5 Þ; ½1; 3Š; ½9; 10ŠÞ;       is shown in Fig. 4.
        Sigðv3 Þ ¼ hðSigðv4 Þ; ½4; 8Š; Sigðv5 Þ; ½1; 3Š; ½9; 10ŠÞ:
   Note that we can perform further optimization of
removing nonterminal nodes that have only one outgoing
edge in an FDD, which is similar to the path compression
technique for binary tries [37].
                                                                            Running the FDD-based firewall query processing
5.5   Complexity Analysis of Firewall Query                              algorithm, the result contains the following two predicates:
      Processing Algorithms
                                                                                                 S 2 ½4; 7Š ^ D 2 ½2; 5Š;
Next, we analyze the complexity of rule-based and FDD-
based firewall query processing algorithms, which shows
                                                                                            S 2 ð½3; 3Š _ ½8; 8ŠÞ ^ D 2 ½2; 5Š:
that the FDD-based algorithm is much more efficient.
                                                                            To make the query result easier for firewall adminis-
5.5.1 Complexity of Rule-Based Firewall Query                            trators to read, we next present an algorithm to minimize
       Processing Algorithm                                              the number of predicates generated from the firewall query
Given a firewall, which may be consistent or inconsistent,               engine. This algorithm consists of three steps. In the first
we first need to convert it to an equivalent consistent                  step, we treat every predicate as a firewall rule and convert
LIU AND GOUDA: FIREWALL POLICY QUERIES                                                                                                    9




Fig. 7. A partial FDD before reduction.

                                                                        Fig. 8. A reduced partial FDD.
these nonoverlapping rules with the same decisions to an
equivalent partial FDD. A partial FDD is a diagram that has
                                                                        S 2 ½6; 9Š ^ D 2 ½4; 7Š is S 2 ½6; 8Š ^ D 2 ½4; 5Š. As the predi-
all the properties of an FDD except the completeness
                                                                        cates in any query result are nonoverlapping, for any
property. For example, we treat the above two predicates as
                                                                        predicate in A1 , it overlaps at most one predicate in A2 .
the following two rules where the decision of each rule is
the decision in the “where” clause                                      7.3 Minus
                                                                        Computing the minus of one firewall query result from
                    S 2 ½4; 7Š ^ D 2 ½2; 5Š ! d;
                                                                        another one is more involved. Given two firewall query
                                                                        results A1 and A2 , we compute A1 À A2 as follows. In the
               S 2 ð½3; 3Š _ ½8; 8ŠÞ ^ D 2 ½2; 5Š ! d:                  first step, we build a partial FDD from the rules formed
   Fig. 7 shows the converted partial FDD from these two                by the predicates in A2 . In the second step, for each
rules.                                                                  predicate P in A1 , we append a rule r formed by P to the
   In the second step, we run the FDD reduction algorithm               partial FDD such that the resulting partial FDD is
on the partial FDD. Essentially, this step combines some                equivalent to the rule sequence that is formed by all the
predicates together. Fig. 8 shows the reduced partial FDD.              rules formed by the predicates in A2 followed by the rule
   In the third step, we generate predicates from the                   formed by P. After appending r, all the new paths
reduced partial FDD. The predicate that is generated from               generated from r constitute r À A2 .
the reduced partial FDD in Fig. 8 is                                        Next, we consider how to append rule r to this partial
                                                                        FDD. Suppose the partial FDD constructed from A2 has a
                       S 2 ½3; 8Š ^ D 2 ½2; 5Š:                         root v with label F1 , and v has k outgoing edges e1 ; Á Á Á ; ek .
   Alternately, we can simply present the reduced partial               Let r be the rule ðF1 2 S1 Þ ^ Á Á Á ^ ðFd 2 Sd Þ ! hdecisioni,
FDD as the query result.                                                which is formed by a predicate in A1 .
   In some cases, minimizing the number of predicates                       First, we examine whether we need to add another
generated from the firewall query engine may not be the                 outgoing edge to v. If S1 À ðIðe1 Þ [ Á Á Á [ Iðek ÞÞ 6¼ ;, we need
best way to present query results to firewall administrators.           to add a new outgoing edge with label S1 À ðIðe1 Þ [ Á Á Á [
It is certainly worth investigating better ways to present              Iðek ÞÞ to v, because any packet whose F1 field is an element
query results to administrators. We leave a comprehensive               of S1 À ðIðe1 Þ Á Á Á [ Iðek ÞÞ does not match any of the first i
study of this issue to future work.                                     rules, but does match r provided that the packet satisfies
                                                                        ðF2 2 S2 Þ ^ Á Á Á ^ ðFd 2 Sd Þ. We then build a decision path
                                                                        from ðF2 2 S2 Þ ^ Á Á Á ^ ðFd 2 Sd Þ ! hdecisioni, and make the
7    FIREWALL QUERY ALGEBRA                                             new edge of the node v point to the first node of this
Similar to SQL, a complex firewall query can be formulated              decision path.
by the union, intersect, or minus of multiple queries. In this              Second, we compare S1 and Iðej Þ for each j where
section, we present algorithms for processing such complex              1 j k. This comparison leads to one of the following
firewall queries.                                                       three cases:

7.1 Union                                                                  1.   S1 \ Iðej Þ ¼ ;: In this case, we skip edge ej because
Performing the union of two firewall query results is                           any packet whose value of field F1 is in set Iðej Þ does
simple: Combine the two sets of predicates into one set and                     not match r.
then run the firewall query postprocessing algorithm to                    2.   S1 \ Iðej Þ ¼ Iðej Þ: In this case, for a packet whose
minimize the number of predicates.                                              value of field F1 is in set Iðej Þ, it may match one
                                                                                of the first i rules, and it may also match rule r.
7.2 Intersect                                                                   So, we append the rule ðF2 2 S2 Þ ^ Á Á Á ^ ðFd 2
The intersect of two firewall query results A1 and A2 can be                    Sd Þ ! hdecisioni to the subgraph rooted at the
done by simply intersecting every predicate in A1 and                           node that ej points to.
every predicate in A2 . More formally, S1 \ S2 ¼                           3.   S1 \ Iðej Þ 6¼ ; and S1 \ Iðej Þ 6¼ Iðej Þ: In this case, we
fP 1 \ P 2 jP 1 2 A1 ; P 2 2 A2 g. Given two predicates P 1 ¼                   split edge e into two edges: e0 with label Iðej Þ À S1
F1 2 S1 ^ Á Á Á ^ Fd 2 Sd a n d P 2 ¼ F1 2 S1 0 ^ Á Á Á ^ Fd 2 Sd 0 ,           and e00 with label Iðej Þ \ S1 . Then, we make two
P 1 \ P 2 ¼ F1 2 ðS1 \ S1 0 Þ ^ Á Á Á ^ Fd 2 ðSd \ Sd 0 Þ. For exam-            copies of the subgraph rooted at the node that ej
ple, the intersect of two predicates S 2 ½3; 8Š ^ D 2 ½2; 5Š and                points to, and let e0 and e00 point to one copy each.
10                                             IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,         VOL. 20,   NO. X,   XXX 2009




Fig. 9. A partial FDD.

       We then deal with e0 using the first case, and e00
       using the second case.
    We next show an example of computing the minus on
firewall query results. Let A1 ¼ fS 2 ½6; 9Š ^ D 2 ½4; 7Šg and
                                                                    Fig. 10. Query processing time versus number of rules.
A2 ¼ fS 2 ½3; 8Š ^ D 2 ½2; 5Šg. To compute A1 À A2 , we first
construct a partial FDD from A2 , which is shown in Fig. 8.
                                                                    rule-based firewall query processing algorithm. For exam-
Second, we append S 2 ½6; 9Š ^ D 2 ½4; 7Š to this partial FDD.
                                                                    ple, for processing a query over an inconsistent firewall that
The resulting FDD is shown in Fig. 9. The dashed paths
                                                                    has 10,000 rules, the FDD-based query processing algorithm
represent the result of A1 À A2 .
                                                                    uses about 10 milliseconds, while the rule-based query
                                                                    processing algorithm uses about 100 milliseconds. The
8    EXPERIMENTAL RESULTS                                           experimental results in Fig. 11 confirm our analysis that the
In this section, we evaluate the efficiency of our firewall         FDD-based query processing algorithm saves execution
query processing algorithms by the average execution time           time by reducing repeated calculations.
                                                                       Fig. 11 shows the average execution time for constructing
of each algorithm versus the total number of rules in the
                                                                    an equivalent FDD from an inconsistent firewall. In Fig. 11,
original inconsistent firewalls. In the absence of publicly
                                                                    the horizontal axis indicates the total number of rules in the
available firewalls, we create synthetic firewalls according
                                                                    original inconsistent firewalls, and the vertical axis indi-
to the characteristics of real-life packet classifiers discovered
                                                                    cates the average execution time (in seconds) for construct-
in [3], [23]. Note that a firewall is also a packet classifier.
                                                                    ing an equivalent FDD from an inconsistent firewall. From
Each rule has the following five fields: interface, source IP
                                                                    Fig. 11, we can see that the FDD construction algorithm is
address, destination IP address, destination port number,
                                                                    very efficient. It takes less than 4 seconds to construct an
and protocol type. The programs are implemented in SUN
                                                                    equivalent FDD from an inconsistent firewall that has up to
Java JDK 1.4. The experiments were carried out on a
                                                                    10,000 rules. Thus, if one intends to run more than 40-
SunBlade 2,000 machine running Solaris 9 with 1Ghz CPU
                                                                    50 queries on the same firewall, then using the FDD-based
and 1 GB of memory.
    Fig. 10 shows the average execution time of the rule-           algorithm, even including the cost of building the FDD, is
based firewall query processing algorithm and the FDD-              more efficient than the simple linear search.
based firewall query processing algorithm versus the total
number of rules in the original inconsistent firewalls. In          9   CONCLUDING REMARKS
Fig. 10, the horizontal axis indicates the total number of
rules in the original inconsistent firewalls, and the vertical      We make a number of contributions in this paper. First, we
axis indicates the average execution time (in milliseconds)         introduce a simple and effective SQL-like query language,
for processing a firewall query. Note that in Fig. 10, the
execution time of the FDD-based firewall query processing
algorithm does not include the FDD construction time
because the conversion from a firewall to an equivalent
FDD is performed only once for each firewall, not for each
query. Similarly, the execution time of the rule-based
firewall query processing algorithm does not include the
time for converting an inconsistent firewall to an equivalent
consistent firewall because this conversion is performed
only once for each firewall, not for each query. Recall that
the procedure for converting an inconsistent firewall to a
consistent firewall consists of two steps: first, construct an
equivalent FDD from the original inconsistent firewall;
second, generate one rule for each decision path of the FDD,
and any sequence of all the rules defined by the decision
paths of the FDD constitutes the final consistent firewall.
    From Fig. 11, we can see that the FDD-based firewall
query processing algorithm is much more efficient than the          Fig. 11. FDD construction time versus number of rules.
LIU AND GOUDA: FIREWALL POLICY QUERIES                                                                                                               11


which is called the Structured Firewall Query Language, for                  [21] M.G. Gouda and A.X. Liu, “A Model of Stateful Firewalls and its
                                                                                  Properties,” Proc. IEEE Int’l Conf. Dependable Systems and Networks
describing firewall queries. Second, we present a theorem,                        (DSN ’05), pp. 320-327, June 2005.
the Firewall Query Theorem, as the foundation for devel-                     [22] M.G. Gouda and A.X. Liu, “Structured Firewall Design,” Computer
oping firewall query processing algorithms. Third, we                             Networks J., vol. 51, no. 4 pp. 1106-1120, Mar. 2007.
present an efficient query processing algorithm that uses                    [23] P. Gupta, “Algorithms for Routing Lookups and Packet Classifi-
                                                                                  cation,” PhD thesis, Stanford Univ., 2000.
firewall decision diagrams as its core data structure. Our                   [24] J.D. Guttman, “Filtering Postures: Local Enforcement for Global
experimental results show that this query processing                              Policies,” Proc. IEEE Symp. Security and Privacy, pp. 120-129, 1997.
algorithm is very efficient. Fourth, we present methods for                  [25] A. Hari, S. Suri, and G.M. Parulkar, “Detecting and Resolving
optimizing firewall query results. At last, we present                            Packet Filter Conflicts,” Proc. IEEE INFOCOM, pp. 1203-1212,
                                                                                  2000.
methods for performing the union, intersect, and minus                       [26] S. Hazelhurst, A. Attar, and R. Sinnappan, “Algorithms for
operations on firewall query results.                                             Improving the Dependability of Firewall and Filter Rule Lists,”
                                                                                  Proc. Int’l Conf. Dependable Systems and Networks (DSN ’00),
                                                                                  pp. 576-585, 2000.
ACKNOWLEDGMENTS                                                              [27] J. Hwang, T. Xie, F. Chen, and A.X. Liu, “Systematic Structural
                                                                                  Testing of Firewall Policies,” Proc. 27th IEEE Int’l Symp. Reliable
The authors thank the anonymous reviewers for their                               Distributed Systems (SRDS), 2008.
constructive comments and suggestions on improving the                       [28] S. Kamara, S. Fahmy, E. Schultz, F. Kerschbaum, and M. Frantzen,
                                                                                  “Analysis of Vulnerabilities in Internet Firewalls,” Computers and
presentation of this work. The work of Alex X. Liu is
                                                                                  Security, vol. 22, no. 3, pp. 214-232, 2003.
supported in part by the US National Science Foundation                      [29] A.X. Liu, “Change-Impact Analysis of Firewall Policies,” Proc.
(NSF) under Grant No. CNS-0716407. The work of Mohamed                            12th European Symp. Research Computer Security (ESORICS),
G. Gouda is supported by the NSF under Grant No. 0520250.                         pp. 155-170, Sept. 2007.
                                                                             [30] A.X. Liu, “Firewall Policy Verification and Troubleshooting,” Proc.
                                                                                  IEEE Int’l Conf. Comm. (ICC), May 2008.
                                                                             [31] A.X. Liu and M.G. Gouda, “Complete Redundancy Detection in
REFERENCES                                                                        Firewalls,” Proc. 19th Ann. IFIP Conf. Data and Applications Security,
[1]    ipchains, http://www.tldp.org/howto/ipchains-howto.html,                   pp. 196-209, Aug. 2005.
       2009.                                                                 [32] A.X. Liu and M.G. Gouda, “Diverse Firewall Design,” IEEE Trans.
[2]    E. Al Shaer and H. Hamed, “Discovery of Policy Anomalies in                Parallel and Distributed Systems, to be published.
       Distributed Firewalls,” Proc. IEEE INFOCOM ’04, Mar. 2004.            [33] A.X. Liu, C.R. Meiners, and E. Torng, “TCAM Razor: A Systematic
[3]    F. Baboescu, S. Singh, and G. Varghese, “Packet Classification for         Approach Towards Minimizing Packet Classifiers in TCAMs,”
       Core Routers: Is There an Alternative to CAMs?” Proc. IEEE                 IEEE/ACM Trans. Networking, to be published.
       INFOCOM), 2003.                                                       [34] A. Mayer, A. Wool, and E. Ziskind, “Fang: A Firewall Analysis
[4]    F. Baboescu and G. Varghese, “Fast and Scalable Conflict                   Engine,” Proc. IEEE Symp. Security and Privacy, pp. 177-187, 2000.
       Detection for Packet Classifiers,” Proc. 10th IEEE Int’l Conf.        [35] A. Mayer, A. Wool, and E. Ziskind, “Offline Firewall Analysis,”
       Network Protocols, 2002.                                                   Int’l J. Information Security, vol. 5, no. 3, pp. 125-144, 2005.
[5]    Y. Bartal, A.J. Mayer, K. Nissim, and A. Wool, “Firmato: A Novel      [36] J.D. Moffett and M.S. Sloman, “Policy Conflict Analysis in
       Firewall Management Toolkit,” Proc. IEEE Symp. Security and                Distributed System Management,” J. Organizational Computing,
       Privacy, pp. 17-31, 1999.                                                  vol. 4, no. 1, pp. 1-22, 1994.
[6]    Y. Bartal, A.J. Mayer, K. Nissim, and A. Wool, “Firmato: A Novel      [37] D.R. Morrison, “Patricia Practical Algorithm to Retrieve Informa-
       Firewall Management Toolkit,” ACM Trans. Computer Systems,                 tion Coded in Alphanumeric,” J. ACM, vol. 15, no. 4, pp. 514-534,
       vol. 22, no. 4, pp. 381-420, 2004.                                         1968.
[7]    CERT, Test the Firewall System, http://www.cert.org/security-         [38] Nessus, http://www.nessus.org/, Mar. 2004.
       improvement/practices/p060.html, 2009.                                [39] R. Rivest, The MD5 Message-Digest Algorithm, RFC 1321, 1992.
[8]    CERT Coordination Center, http://www.cert.org/advisories/ca-          [40] D. Rovniagin and A. Wool, “The Geometric Efficient Matching
       2003-20.html, Aug. 2003.                                                   Algorithm for Firewalls,” Proc. 23rd IEEE Convention of Electrical
[9]    CheckPoint FireWall-1, http://www.checkpoint.com/, Mar. 2005.              and Electronics Eng. in Israel (IEEEI), technical report, pp. 153-156,
[10]   Cisco PIX 500 Series Firewalls, http://www.cisco.com/en/us/                http://www.eng.tau.ac.il/~yash/ees2003-6.ps, 2004.
       products/hw/vpndevc/ps2030/, Nov. 2003.                               [41] A.D. Rubin, D. Geer, and M.J. Ranum, Web Security Sourcebook.
[11]   D. Moore et al., http://www.caida.org/outreach/papers/2003/                first ed., Wiley Computer Publishing, 1997.
       sapphire/sapphire.html, 2003.                                         [42] A. Wool, “Architecting the Lumeta Firewall Analyzer,” Proc. 10th
[12]   D. Dobkin and R.J. Lipton, “Multidimensional Searching Pro-                USENIX Security Symp., pp. 85-97, Aug. 2001.
       blems,” SIAM J. Computing, vol. 5, no. 2, pp. 181-186.                [43] A. Wool, “A Quantitative Study of Firewall Configuration Errors,”
[13]   D. Eastlake and P. Jones, “Us Secure Hash Algorithm 1 (SHA-1),”            Computer, vol. 37, no. 6, pp. 62-67, June 2004.
       RFC 3174, 2001.                                                       [44] A. Wool, “The Use and Usability of Direction-Based Filtering in
[14]   D. Eppstein and S. Muthukrishnan, “Internet Packet Filter                  Firewalls,” Computers & Security, vol. 23, no. 6, pp. 459-468, 2004.
       Management and Rectangle Geometry,” Proc. Symp. Discrete              [45] J. Xu and M. Singhal, “Design and Evaluation of a High-
       Algorithms, pp. 827-835, 2001.                                             Performance ATM Firewall Switch and Its Applications,” IEEE J.
[15]   P. Eronen and J. Zitting, “An Expert System for Analyzing                  Selected Areas in Comm., vol. 17, no. 6, pp. 1190-1200, 1999.
       Firewall Rules,” Proc. Sixth Nordic Workshop Secure IT Systems        [46] J. Xu and M. Singhal, “Design of a High-Performance ATM
       (NordSec ’01), pp. 100-107, 2001.                                          Firewall,” ACM Trans. Information and System Security, vol. 2, no. 3,
[16]   D. Farmer and W. Venema, Improving the Security of Your Site by            pp. 269-294, 1999.
       Breaking into It, http://www.alw.nih.gov/Security/Docs/admin-         [47] L. Yuan, H. Chen, J. Mai, C.-N. Chuah, Z. Su, and P. Mohapatra,
       guide-to-cracking.101.html, 1993.                                          “Fireman: A Toolkit for Firewall Modeling and Analysis,” Proc.
[17]   M. Frantzen, F. Kerschbaum, E. Schultz, and S. Fahmy, “A                   IEEE Symp. Security and Privacy, May 2006.
       Framework for Understanding Vulnerabilities in Firewalls Using
       a Dataflow Model of Firewall Internals,” Computers and Security,
       vol. 20, no. 3, pp. 263-270, 2001.
[18]   M. Freiss, Protecting Networks with SATAN. O’Reilly & Assoc., Inc.,
       1998.
[19]   M. Gouda, A.X. Liu , and M. Jafry, “Verification of Distributed
       Firewalls,” Proc. IEEE GLOBECOM, 2008.
[20]   M.G. Gouda and A.X. Liu, “Firewall Design: Consistency,
       Completeness and Compactness,” Proc. 24th IEEE Int’l Conf.
       Distributed Computing Systems (ICDCS ’04), pp. 320-327, 2004.
12                                                     IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,              VOL. 20,   NO. X,   XXX 2009

                        Alex X. Liu received the PhD degree in computer                                Mohamed G. Gouda received the BSc degree in
                        science from The University of Texas at Austin in                              engineering and in mathematics from Cairo
                        2006. He is currently an assistant professor in the                            University, the MA degree in mathematics from
                        Department of Computer Science and Engineer-                                   York University, and the master’s and PhD
                        ing at Michigan State University. He won the                                   degrees in computer science from the University
                        2004 IEEE&IFIP William C. Carter Award, the                                    of Waterloo. He was with the Honeywell Corpo-
                        2004 National Outstanding Overseas Students                                    rate Technology Center at Minneapolis from
                        Award sponsored by the Ministry of Education of                                1977 to 1980. In 1980, he joined The University
                        China, the 2005 George H. Mitchell Award for                                   of Texas at Austin, where he currently holds the
                        Excellence in Graduate Research in the Uni-                                    Mike A. Myers Centennial professorship in
versity of Texas at Austin, and the 2005 James C. Browne Outstanding          computer sciences. He is the 1993 winner of the Kuwait Award in Basic
Graduate Student Fellowship in the University of Texas at Austin. His         Sciences. He won the 2001 IEEE Communication Society William R.
research interests include computer and network security, dependable          Bennet Best Paper Award for his paper “Secure Group Communications
and high-assurance computing, applied cryptography, computer net-             Using Key Graphs,” coauthored with C.K. Wong and S.S. Lam, and
works, operating systems, and distributed computing. He is a member of        published in the IEEE/ACM Transactions on Networking (vol. 8, no. 1,
the IEEE.                                                                     pp. 16-30). In 2004, his paper “Diverse Firewall Design,” coauthored with
                                                                              Alex X. Liu and published in the Proceedings of the International
                                                                              Conference on Dependable Systems and Networks, won the William C.
                                                                              Carter award. He is a member of the IEEE.


                                                                              . For more information on this or any other computing topic,
                                                                              please visit our Digital Library at www.computer.org/publications/dlib.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:109
posted:12/20/2011
language:Latin
pages:12