Implementing Policy-based Content Filtering for Web Servers

Document Sample
Implementing Policy-based Content Filtering for Web Servers Powered By Docstoc
					        Implementing Policy-based Content Filtering for
                       Web Servers
                                         Tony White1, Eugen Bacic2
                             1
                              School of Computer Science, Carleton University
                                           2
                                             Cinnabar Networks
                              {arpwhite@scs.carleton.ca, ebacic@cinnabar.ca}


                Abstract: Web servers dominate our view of the Web today. Security
                provided by them has been implemented with varying degrees of success.
                Web servers are frequently successfully attacked, with subsequent loss of
                corporate loss of face or revenue. Recent legislation has increased the
                importance of ensuring that only approved users gain access to
                information, which often implies filtering content served by applications.
                While content filtering can be implemented at the application level, this
                paper describes an innovative architecture for policy-based filtering that
                can be integrated with existing web applications.

                Keywords: web server, policy, content filtering

                                                         control being determined fully, or in part, by the
1. Introduction                                          path through which information is accessed.
                                                         Here, we mean access control to be the decision
                                                         to process a given HTTP request. By content
Servers dominate the Web today. We rely on
                                                         control we mean the filtering of information
search engines, meta-search engines, portals and
                                                         generated by a web application based upon the
a wide range of other services hosted off of web
                                                         identity of the user and state of a workflow
servers accessed using HTTP or its secure
                                                         process.
variant. Business-to-business (B2B) interactions
                                                              Even within the 3-tier web application
involve web servers and other modes of access.
                                                         architecture that is most commonly employed,
In a time when knowledge and information are
                                                         where web, application and database servers
increasingly the measurable assets of a
                                                         have been combined, access control has been
corporation, information security is becoming
                                                         built into all three components. Clearly, when
more and more important. Recent legislation
                                                         distributing the security responsibility across
concerning the privacy of health care records
                                                         multiple components, creating a consistent view
(HIPAA) has increased the importance of secure
                                                         of security is difficult using such architectural
web-based information access. While access and
                                                         schemes. The weakest link is not always
content control has been addressed in a number
                                                         obvious. For example, a user with web and
of ways, solutions have been implemented on a
                                                         database access might find that their ability to
product-by-product basis. A consistent solution
                                                         access information varies depending upon
for access and content control has yet to be
                                                         whether they access the information in the
implemented for web servers, although
                                                         database directly or via the Web. This could
applicable criteria and models exist [1], [2], [3],
                                                         easily occur if access control is not harmonized
[4]. Heterogeneous implementation of access
                                                         between the database and Web applications that
and content control has lead to incoherent
                                                         retrieve and process the data. Clearly it is very
security solutions being placed in service, access
                                                                                                                             Browser
                                                          Browser

                                                                                                                                 Filtered Page
                                   Page Please

                                                                                                                    Web server
                                                 Web server                  Web Pages                              mediation
         Web Pages            OK to serve        mediation

                                page                                            Static pages
            Static pages                                                     (from filesystem)                                          Web
         (from filesystem)                                           Web                                                               Server
                                                                    Server
                                                                              Dynamic pages
          Dynamic pages                                                      (from application)
         (from application)
                                                                                                  Served Page                       Web Agent
                                                                Web Agent


                                                                                                        Tevaluate                 Revaluate
                                      Tmediate                Rmediate


                                                                                                                        Realm Controller
                                                     Realm Controller




difficult to decide what content should be                                            Figure 1: Web Server Content Filtering
provided to an end user if filtering takes place at
multiple points in the n-tier business application                                SecureRealms defines a small, functional,
as no single component has a view of the entire                              security-aware meta-language, called Idyllic [8],
workflow process. This motivates the design of                               which is a Policy Meta Language. This language
the centralized, web-based content filtering                                 is capable of codifying any business rule and
solution described in this paper.                                            resembles LISP, which has well known
                                                                             properties [7]. It is based on s-expressions,
2. Web Server with SecureRealms                                              which are becoming an important component of
                                                                             XML as X-expressions (XEXPR) [5]. There is a
                                                                             straightforward mapping from XML’s XEXPR
The SecureRealms architecture was introduced
                                                                             to Idyllic’s s-expression. It should also be noted
in [8]. The essential characteristics of the
                                                                             that authorization capabilities for XML are only
architecture are that all security objects are
                                                                             now emerging, with the SAML specification [6]
represented as entities, and all entities have
                                                                             still under discussion. SAML represents a
associated security policies that are stored in a
                                                                             familiar     access     control    solution    for
repository called the Virtual Resource Attribute
                                                                             authorization; one we feel will prove insufficient
Database (VRAD). The security policies
                                                                             for the dynamic security needs of c-commerce.
associated with entities are evaluated using a
                                                                                  The Web server instantiation of the
Generic Policy Engine [10] built into the Realm
                                                                             SecureRealms architecture (hereafter referred to
Controller. Mediation of access to resources is
                                                                             as SR-Web) operates by introducing a plug-in to
achieved by the evaluation of the security policy
                                                                             the web server that interrupts the usual flow of
associated with the resource requested in the
                                                                             content delivery. More specifically, we wait for
context of the resource access. For example, in
                                                                             the server to decide if a page can be served, and
the case of a web page access, the two entities
                                                                             then we use the services of the Realm Controller
involved are the user requesting the page and the
                                                                             to determine if the web server should still be
page itself. The context of the access would
                                                                             allowed to serve the page. Once page content is
include the Apache permissions associated with
                                                                             available, we intercept it so that we can perform
the page or directory.
                                                                             sub-page level filtering before giving the page
back to the server for delivery to the requesting    following sections discuss the adapter and filter
browser, or application.                             modules shown in Figure 2.
     The mediation flow of control is shown in
Figure 1. A browser requests a page (‘Page           2.1.1   Web Server – Web Agent Adapter
Please’). If the web server determines (during
web server mediation) that the page request may      A web server goes through a number of steps
be honored, the web agent then gains access to       between receiving a page request from a browser
the request. The request is packaged into a          and returning the page for viewing. Web servers
Tmediate message for transmission to the Realm       have been designed so that external programs
Controller. The Realm Controller responds with       can bind to the web server at any or all of these
an Rmediate message. If the mediation indicates      points to replace or augment the server’s normal
that the request is allowable, the web agent         behaviour. While the processing stages are
returns control to the web server with an            similar for the web servers we have investigated
indication that the request may continue. The        (Apache, IIS), the actual method of binding to
web server then causes the page to be returned       the server is different for each server. The Web
(static) or generated (dynamic). The interaction     Agent therefore has two server specific pieces of
between the Web Server and the remaining             code (the adapters) that bind the common
layers of the web application are unchanged.         portion of the Web Agent to the server.
     Once the requested page content is                  In practice, the adapters set callbacks that
available, the web server delivers the page to the   the server will use to notify that a server
Web Agent. If filtering is enabled, the web agent    processing stage is complete. The callback
scans the page contents for special tags. If any     routine will get session and transaction
are found, the information is packaged up into a     parameters directly through the callback, or
Tevaluate message for delivery to the Realm          separately as globally available server
Controller. The Realm Controller evaluates the       environment variables. For example, one server
incoming data and returns information to the         environment variable records the user name
Web Agent that will allow it to filter the served    associated with the current page request, another
page. Once the page is filtered, it is passed back   records the fully remapped page path, another
to the web server for final delivery to the          the server version, and so on.
requesting browser.

2.1 The Web Agent                                    2.1.1.1 Apache

                                                     Under Apache, server extensions are called
    Web Agent/
    Web Server
                                                     modules. Modules are usually compiled into the
     adapter         Web Agent Control               server, but can also be linked in dynamically.
                                                     We implemented the dynamic link approach.
                                                     The Apache web agent must be multithreaded to
           Mediate               Filter              work correctly with the server. Apache performs
                                                     the following functions when processing a page
                     Cache
                                                     request:

        Figure 2: Web Agent Architecture             1. URL -> Filename translation
                                                     2. Authentication ID checking [is the user who
The Web Agent is constructed to maximize the            they say they are?]
amount of server independent code. It does this      3. Authentication access checking [is the user
by providing an adapter component that hides            authorized here?]
the details of the server implementation. The        4. Access checking other than authentication
5. Determining MIME type of the object
   requested                                          1.  SF_NOTIFY_READ_RAW_DATA: When a
6. `Fixups' --- there aren't any of these yet.            client sends a request, one or more
7. Actually sending a response back to the                SF_NOTIFY_READ_RAW_DATA
   client.                                                notifications will occur.
8. Logging the request                                2. SF_NOTIFY_PREPROC_HEADERS: This
                                                          notification indicates that the server has
    Steps 1-4 are where Apache resolves the               completed pre-processing of the headers
information necessary to perform mediation and            associated with the request, but has not yet
                                                          begun to process the information contained
carries out its own mediation. In step 5 Apache
                                                          within the headers.
determines what sort of handler should be used
                                                      3. SF_NOTIFY_URL_MAP: An
to process the page request. Step 7 is where the          SF_NOTIFY_URL_MAP notification occurs
server actually either retrieves a static page or         whenever the server is converting a URL into a
invokes an application to create a dynamic page.          physical path.
    For us to do mediation, we need access to         4. SF_NOTIFY_AUTHENTICATION: An
fully resolved request information at a point             SF_NOTIFY_AUTHENTICATION notification
where we can stop further request processing.             occurs just before IIS attempts to authenticate the
Step 6 gives us this access point – here we have          client.
access to full pathnames and authenticated user       5. SF_NOTIFY_AUTH_COMPLETE: This
names and we can instruct Apache to abort the             notification fires after the client’s identity has
request depending on the results of our                   been negotiated with the client.
mediation.                                            6. SF_NOTIFY_READ_RAW_DATA: As
    On the filtering side, we need to capture the         mentioned in step 1, if the client has more data to
output of step 7. Unfortunately, the modules              send, one or more
doing the work in step 7 return data directly to          SF_NOTIFY_READ_RAW_DATA
the web server without offering us a glimpse at it        notifications will occur here.
on the way past. To resolve this, the Apache          7. At this point in the request, IIS will begin to
                                                          process the substance of the request. This may be
adapter wraps the step 7 modules in a handler of
                                                          done by an ISAPI extension, a CGI application, a
our own that itself invokes the original step 7
                                                          script engine (such as ASP, PERL, and so on), or
modules in a way that allows us access to                 by IIS itself for static files.
returned data.                                        8. SF_NOTIFY_SEND_RESPONSE: The
                                                          SF_NOTIFY_SEND_RESPONSE event occurs
2.1.1.2 IIS                                               after the request is processed and before headers
                                                          are sent back to the client.
Under IIS, server extensions are called either        9. SF_NOTIFY_SEND_RAW_DATA: As the
extensions or filters, depending on what                  request handler returns data to the client, one or
functionality they implement. For the Web                 more SF_NOTIFY_SEND_RAW_DATA
Agent, we will be creating a filter to gain the           notifications will occur.
most complete access to the server. IIS filters are   10. SF_NOTIFY_END_OF_REQUEST: At the end
built as DLLs. The IIS web agent must be                  of each request, the
                                                          SF_NOTIFY_END_OF_REQUEST notification
multithreaded to work correctly with the server.
                                                          occurs.
    The IIS adapter has to do the same sort of
                                                      11. SF_NOTIFY_LOG: After the HTTP request has
things as the Apache adapter – allow us to catch          been completed, the SF_NOTIFY_LOG
resolved user name and file path information for          notification occurs just before IIS writes the
mediation purposes and then allow us to                   request to the IIS log.
intercept output prior to delivery to the             12. SF_NOTIFY_END_OF_NET_SESSION: When
requesting browser. The IIS API offers us the             the connection between the client and server is
following hooks:                                          closed, the
    SF_NOTIFY_END_OF_NET_SESSION                      ‘filterData’ callback may be invoked multiple
    notification occurs.                              times to pass page data to the Web Agent for
                                                      filtering.
    Our mediation is triggered by the
SF_NOTIFY_AUTH_COMPLETE in step 5.                    2.1.3 Page Request Mediation
Unlike Apache, IIS does offer us a look at
returned data. Filtering then is carried out in       If SR-Web determines that a user cannot access
response                    to                 the    a page, the page that will be returned to the
SF_NOTIFY_SEND_RAW_DATA event (or                     requesting user will be identical to one the web
events) by passing this data to the Web Agent         server would have returned if it had blocked the
for possible modification. The flow of control        page access. This has the advantage of making
shown corresponds to IIS V5. IIS V4 has a             the authorization engine transparent to the user,
slightly smaller set of hooks and requires a          identical errors being returned with or without
slightly different flow of control. The principles,   SR-Web.
however, are the same for both versions.                  SR-Web mediation occurs after any
                                                      mediation done by the web server. At that point,
2.1.2   Adapter – Web Agent interface                 the adapter will invoke the ‘canAccessPage’
                                                      callback with the authenticated user name and
The previous two sections have documented             the requested file name.
where and how the adapter portion of the Web
Agent hooks into the web servers. To ensure that      2.1.4    Page Content Filtering
the Web Agent code is common to all web
servers, the adapters present a common interface      Filtering of web pages is a significant new
between the web server and the Web Agent. The         security function. A filtered page is one that may
interface is implemented as a set of 5 callbacks      have had portions of the page content removed
instantiated in the Web Agent code. The               as determined by the privileges of the requesting
callbacks are:                                        user and the policies attached to that portion of
    •   Boolean canAccessPage                         the page. A page to be filtered, whether it is a
        (authenticatedUserName, filePath)             static or dynamic web page, must properly
    •   Boolean filterOn (filePath)                   enclose the block that is to be filtered in a pair of
                                                      ‘srf’ start and end tags. An example is shown
    •   Boolean filterStart (authenticatedUserName)   below, with the content filtering tags shown in
    •   Integer filterData (authenticatedUserName,    bold text. The ‘srf’ start tag must have a ‘filter’
        pageContent, size)                            attribute. The value of the filter attribute is one
                                                      or more name/value pairs. The attribute name
    •   Boolean filterEnd (authenticatedUserName)
                                                      corresponds to an entity called a filter entity. A
     The first callback invokes the mediation         web page, modified to allow for sub-page level
portion of the Web Agent and its outcome              filtering now looks like:
determines whether the adapter will allow the
web server to continue processing or not. The         <html>
remaining four callbacks relate to Web Agent          <head>
filtering. The first (filterOn) allows the adapter    <meta name="srf" content="FilterEntityA,
to determine if Web Agent filtering is enabled. If    FilterEntityB">
filtering is disabled, the adapter can speed page     </head>
processing by not passing page data to the Web        <body>
Agent. The ‘filterStart’ and ‘filterEnd’ callbacks    <p>Some text.</p>
allow the Web Agent to do any page setup and          <srf filter="(FilterEntityA userid),
teardown activities that may be necessary. The        (FilterEntityB userid)">
                                                      <p>Text to be secured.</p>
</srf>                                                 increase performance, a ‘srf’ meta-tag must be
</body>                                                present in the head portion of a web page. The
</html>                                                contents of the meta-tag will be a list of all filter
                                                       entities referenced in the page body. The filter
    As the Web Agent filters the original page,        entity names will be bundled up with the
the second paragraph (and its enclosing ‘srf’ tag)     requesting user name to be sent to the Realm
may be removed from the final output if the            Controller via the Plan Nein Tevaluate message.
Web Agent, working with the Realm Controller,          This can be done even before the balance of the
determines that the requesting user cannot             web page is available. If no filtering meta-tags
access the material.                                   are found in the page head, the rest of the page
                                                       does not have to be screened.
2.1.4.1 Filter Entities
                                                       3. Future Work
Filter entities are ways of naming content that             Policies have begun to be of greater and
share similar characteristics. The characteristics     greater interest. The recent work on the
are identified by end-user analysis of web page        eXtensible Access Control Markup Language
data. A filter entity is a regular VRAD entity to      (XACML) at OASIS [9] is a case in point. The
which a policy may be attached. The indirection        SecureRealms architecture and Idyllic in
allows different policies to be attached to a filter   particular were an outgrowth of years of R&D
entity (and by extension to a fragment of a web        effort. Idyllic was designed to be syntactically
page) without having to alter the source page          correct and provable via denotational logic [10]
data. Filter entities are explicitly created by        and reflected existing technologies of the time.
management activity.                                        XACML specifies a “subject-target-action-
                                                       condition” oriented policy for XML documents.
2.1.4.2 Filter Operation                               A subject is a unique identity, group, or role
                                                       while a target is what is typically referred to as a
Filtering proceeds in several steps. For each ‘srf’    resource or object. XACML includes conditional
tag the filter entity names and requesting user        authorization policies, as well as policies with
name are bundled into a Plan Nein Tevaluate            external post-conditions to specify actions that
message for transmission to the Realm                  must be executed prior to permitting access.
Controller. The Realm Controller evaluates the              With XACML being both an access control
policies bound to each filter entity in the context    policy language and a request/response language
of the user name and packages the results into an      it appears similar in scope and intent to Idyllic.
Revaluate for return to the Web Agent.                 Hence, the XACML policy language is used to
     The filtering software then matches the           express access control policies while the
values returned for each filter entity with the        request/response language expresses queries as
required value from the page. If all the returned      to whether a particular access request should be
filter entity values evaluate to true, the secured     allowed and provides the appropriate response.
content will be passed on to the requesting user,           For example, in the case where a subject
otherwise it will disappear from the output.           wants to take some action on a particular object,
     Should a web page contain a filter entity         or resource, the subject submits its query to the
name that does not have a corresponding entity         component protecting the resource (e.g., file
in the VRAD, a false value will be returned to         system, web server). This component is called a
the Web Agent for that filter entity. This will        Policy Enforcement Point (PEP). The PEP forms
have the effect of suppressing the affected data.      a request (using the request language) based on
     Filtering is potentially an expensive             the attributes of the subject, action, resource, and
operation, as every output page has to be              any other relevant information. The PEP then
checked for the presence of ‘srf’ tags. To             sends this request to a Policy Decision Point
(PDP), which examines the request, retrieves the    in order to take advantage of the enhanced
relevant policies, and determines whether access    security – new srf tags need only be added to
should be granted. That answer (expressed in the    page content. Content filtering and access
response language) is returned to the PEP, which    control is delegated to a centralized security
can then allow or deny access.                      server that is capable of understanding
    With XML becoming a lingua franca for           workflow. Security, independent of access path,
communication of logic between disparate            is clearly provided by the design. We believe
components it only stands to reason that efforts    that it represents a significant step forward in
should be made to see whether or not the lessons    providing technology-independent authorization.
learned from SecureRealms, Idyllic, and the
Realm Controller can be migrated to a full          References
XML-based implementation.
                                                    [1] Department of Defense Trusted Computer
4. Conclusion                                            System Evaluation Criteria. DoD 5200.28-
                                                         STD, December 1985.
As computer networks grow, security is              [2] Communications Security Establishment,
becoming more of a concern with each passing             The Canadian Trusted Computer Product
day. Organizations view and relate to                    Evaluation Criteria. Version 3.0e, January
information differently and have differing               1993.      The Communications Security
requirements for the protection, dissemination,          Establishment, Government of Canada.
and modification of their resources. There are      [3] Information        Technology       Security
now important legal considerations in granting           Evaluation Criteria. Harmonised Criteria of
individuals access to information. Content               France - Germany - the Netherlands - the
filtering as well as access management become            United Kingdom. Version 1, May 2, 1990.
important considerations when designing a web-      [4] Bell, David E. and L.J. LaPadula. Secure
based information system.                                Computer       Systems:      Mathematical
     To date, many organizations have met their          Foundations, ESD-TR-73-278, Volumes I,
security concerns by implementing access                 II, and III. The MITRE Corporation,
prevention mechanisms such as firewalls,                 March, May, and December 1973.
cryptography, and virtual private networks.         [5] http://www.w3.org/TR/xexpr/
General access to host systems is provided based    [6] See SAML references on http://www.oasis-
on the premise that once authenticated, users can        open.org/committees/security/
be given full freedom to perform their duties.      [7] Lee, P. and Pleban, U.F., “On the Use of
Existing security products protect only the              LISP in Implementing Denotational
perimeter creating islands of security and               Semantics”, Proceedings of 1986 ACM
although each performs their individual tasks            Conference on LISP and Functional
very well, interoperability and workflow-related         Programming, Cambridge, Mass., 1986. pp.
issues constantly arise. Solutions to the                233 - 248.
interoperability problems include special servers   [8] White T. and Bacic E. Authorization as a
accessible to external partners and the use of           Service provided by a Generic Policy
web servers to store restricted views of                 Engine. In Proceedings of the 2002
information. Such duplication of information             International Conference on Security and
often leads to errors due to inconsistency and is        Management, Las Vegas, June 24-27 2002.
expensive to maintain.                              [9] http://www.oasis-
     This paper has described an architecture            open.org/committees/xacml/.
where content can be modified after creation,       [10] Bacic, E. The Generic Policy Engine.
based upon policy-based filtering. Legacy n-tier         Master of Computer Science Thesis,
web applications require minimal modification            Carleton University, May 1998.