A Collaborative Environment for Customizable Complex Event

Document Sample
A Collaborative Environment for Customizable Complex Event Powered By Docstoc
					     A Collaborative Environment for Customizable Complex
       Event Processing in Financial Information Systems
                [Regular Paper (it can be also considered as Industry Paper)]




ABSTRACT                                                          single financial institution using several tools that re-enforce
We present a framework to enable financial sector customers        their defense perimeter (e.g. intrusion detection systems,
(such as banks, credit card companies, etc.) to build col-        firewalls etc). These tools detect possible attacks by ex-
laborative protection systems to guard against coordinated        ploiting the information available from the logs maintained
Internet-based attacks. The essential element of this frame-      at the financial institution, for example, carefully looking
work is a new programming abstraction, called Semantic            whether there exists some host that performs suspicious ac-
Room (SR), through which interested parties can process           tivities within certain time windows. Nowadays, attacks are
data and share information and computing resources in a           more sophisticated making this kind of defense inadequate.
trusted and controlled fashion. To this end, each SR is as-       Specifically, attacks are distributed in space and time: Dis-
sociated with a contract which, among other things, spec-         tributed in space means that attacks are coordinated on a
ifies its functionality (e.g., botnet and stealthy scan detec-     large scale basis and originate from multiple geographically
tion), QoS, and a set of rules governing its membership. We       dispersed locations. They are also distributed in time often
present the design of two event processing systems that we        consisting of a preparation phase spanning over several days
implemented to support the SR functionality in large dis-         or weeks, and involving multiple preparatory steps aiming
tributed settings, and show how they can be used for de-          at identifying vulnerabilities and attack vectors (such as ac-
tecting stealthy scans and man-in-the-middle attacks.             cidentally open ports) [7, 8, 10]. To detect these attacks a
                                                                  more large view of what is happening in the Internet is re-
                                                                  quired, that could be obtained by sharing and combining the
Keywords                                                          information available at several financial institutions. This
collaborative architectures for event processing distributed      information has then to be processed and correlated on-the-
event processing Financial Infrastructure Federated event         fly in order to detect threats and frauds. Even though this
based systems Domain specific deployments of event-based           sharing can result in a great advantage for financial institu-
systems, MapReduce                                                tions, it should be carried out only on a clear contractual
                                                                  base and in a trusted and secure environment capable of
1.   INTRODUCTION                                                 ensuring privacy and strict confidentiality requirements of
The growing exposure to the Internet has made financial in-        financial institutions. Therefore, there is an urgent need of
stitutions increasingly vulnerable to a variety of security re-   establishing contractually regulated collaborative environ-
lated risks, such as increasingly sophisticated cyber attacks     ments that can be created in a structured and controlled
aiming at capturing high value (or, otherwise, sensitive) in-     manner, where participants can belong to different organi-
formation, or disrupting service operation for various pur-       zational, administrative and geographical domains.
poses. Such security attacks result in both short and long
term economic losses due to the lack of service availabil-        In this paper we introduce a novel programming abstraction
ity and infrastructural resilience, and the decreased level of    called Semantic Room (SR) that enables the construction
trust on behalf of the customers. This is why such attacks        of such collaborative contractually regulated environments.
are categorized as an operational risk by the Basel Commit-       These environments perform distributed event aggregation
tee on Banking Supervision in the first pillar of the Basel II     and correlation on the data provided by organizations par-
accord [3].                                                       ticipating in the SR with the aim of monitoring widely dis-
                                                                  tributed infrastructures and providing early detection of at-
To date, these attacks have been faced in isolation by the        tacks, frauds and threats. Each SR has a specific strate-
                                                                  gic objective to meet (e.g., botnet detection, stealthy scan,
                                                                  Man-In-The-Middle) and is associated with both a contract,
                                                                  which specifies the set of rights and obligations for govern-
                                                                  ing the SR membership, and different software technologies
                                                                  that are used to carry out the data processing and sharing
                                                                  within the SR. SRs can communicate with each other; that
                                                                  is, the output of an SR can be used as input of another SR,
                                                                  thus creating the conditions for the deployment of a modular
                                                                  environment.
The paper describes also a framework that can be used in          ipants where a massive and collaborative event processing
order to support the SR programming abstraction; a num-           computation can occur.
ber of design alternatives for the framework are discussed.
One of the its key design features is the ability to clearly      Recently, collaborative approaches addressing the specific
separate the SR management from the complex event pro-            problem of Intrusion Detection Systems (IDSs) have been
cessing and sharing. The former is provided using basic SR        proposed in a variety of works [4, 25, 36, 27]. Differently
software that guarantees the construction of the trusted and      from singleton IDSs, collaborative IDSs significantly improve
contractually regulated environment (i.e., the SR) necessary      time and efficiency of misuse detections by sharing informa-
for the execution of a secure event processing computation.       tion on attacks among distributed IDs from one or more
The latter is performed within the SR and can be developed        organizations [37]. The main principle of these approaches
in a custom way according to the SR objective, the available      is that there exist local IDSs that detect suspects by analyz-
computational resources, and the different event processing        ing their own data. These suspects are then disseminated
technologies that can be deployed. In other words, the SR         using possibly peer to peer links. This approach has two
is agnostic with respect to the event processing technology       main limitations: it relies on data that can be freely ex-
(e.g. [17], [9]) and paradigm (i.e., centralized event process-   changed among the peers and it does not fully exploit the
ing vs distributed event processing) being used. It will be       information seen at every site. The former constraint can be
responsibility of the entity that instantiates the SR to cus-     very strong, especially in the financial context we are evalu-
tomize the processing so as to meet the SR objective and          ating, due to the data confidentiality requirements that are
the contractual obligations.                                      to be met. Our framework has the merit to address this spe-
                                                                  cific issue as it is designed so as to provide different levels of
In order to validate the framework earlier mentioned, we          anonymization of the data the organizations can inject into
present two different instances of SRs; namely, an SR for          the SR. Moreover, our framework aims at processing data
stealthy scan detection that uses MapReduce technologies          injected into the SR so as suspects are devised through the
[23] for processing purposes, and an SR for Man-In-The-           exploitation of all the data made available by every partici-
Middle attack detection that uses a DHT-based processing.         pating organization. The main advantage of this approach is
                                                                  that the space and time window used to detect complex at-
The rest of this paper is structured as follows. Section 2        tacks can be surely enlarged, thus sharpening the detection
introduces related work. Section 3 describes the Semantic         accuracy.
Room abstraction highlighting the principal elements that
are used for defining it. Section 4 presents the framework
we have designed in order to support the construction, de-        3.    THE SEMANTIC ROOM ABSTRACTION
ployment and execution of SRs. Section 5 describes two            A Semantic Room (SR) is a federation of financial insti-
different SRs that can be instantiated using the framework         tutions formed for the sake of information processing and
previously introduced, and finally Section 6 discusses the         sharing. The financial institutions participating in a specific
main conclusions of this work and future works.                   SR are referred to as the members of the SR.

                                                                  The SR abstraction is defined by the three following princi-
2.   RELATED WORK                                                 pal elements:
Detecting event patterns, sometime referred to as situations,
and reacting to them are in the core of Complex Event Pro-
cessing (CEP) and Stream Processing (SP), both of which                • contract: each SR is regulated by a contract that de-
play an important role in the IT technologies employed by                fines the set of processing and data sharing services
the financial sector as overviewed in [15]. In particular, IBM            provided by the SR along with the data protection,
System S [17] has been used by market makers in processing               privacy, isolation, trust, security, dependability, and
high-volume market data and obtaining low latency results                performance requirements. The contract also contains
as reported in [35]. System S as other CEP/SP systems,                   the hardware and software requirements a member has
e.g. [16, 29], are based on event detection across distributed           to provision in order to be admitted into the SR. For
event sources.                                                           the sake of brevity we do not include in this paper SR
                                                                         contract details; however, the interested readers can
The issue of using massive complex event processing among                refer to [1] for further information;
heterogeneous organizations for detecting network anomalies
and failures has been suggested and evaluated in [22]. Also            • objective: each SR has a specific strategic objective to
the usefulness of collaboration and sharing information for              meet. For instance, there can exist SRs created for
telco operators with respect to discovering specific network              implementing large-scale stealthy scans detection, or
attacks has been pointed out in [32, 34]. In these works,                SRs created for detecting Man-In-The-Middle attacks;
it has been clearly highlighted that the main limitation of
the collaboration approach concerns the confidentiality re-             • deployments: each SR can be associated with different
quirements. These requirements may be specified by the                    software technologies, thus enabling a variety of SR
organizations that share data and can make the collabora-                deployments. The SR abstraction is in fact highly flex-
tion itself hardly possible as the organizations are typically           ible to accommodate the use of different technologies
not willing to disclose any private and sensitive information.           for the implementation of the processing and sharing
In our framework the notion of SR can be effectively used in              within the SR (i.e., the implementation of the SR logic
order to build a secure and trusted environment, which can               or functionality). In particular, the SR abstraction
be enriched with the degree of privacy needed by the partic-             can support different types of system approaches to
                   Semantic Room!
                                                                          ination are carried out by SR members based on obligations
                                                !"#$%&&%'(')*)(           and restricts specified in the above mentioned contract.
                     Complex Event Processing   !"#$%&#'()
                         and Applications
                                        !       *+#,-./0+#1))

       3'4)5'$')                                                          4.    THE FRAMEWORK
                          Communication!        !"#$%&&%'(')*)((
                                                !%2$%&#'()*+#,-./0+#1))
                                                                          The framework that supports the Semantic Room abstrac-
                                                                          tion over a pool of (locally and geographically) distributed
                                                                          computational, storage, and network resources is shown in
                                 Internet
                                        !
                                                                          Figure 2.

                                                                          In the framework we can clearly identify two principal lay-
                                                                          ers: the SR management layer which is responsible for the
     Figure 1: The Semantic Room abstraction                              management of the SR, and the Complex Event Process-
                                                                          ing and Applications layer which realizes the SR processing
                                                                          and sharing logic. In addition, all the architectural com-
     the processing and sharing; namely, a centralized ap-                ponents of both layers above can utilize various commodity
     proach that employs a central server (e.g., Esper [9]),              services for (i) exchanging control and monitoring informa-
     a decentralized approach where the processing load is                tion among them (such as load and availability), (ii) manag-
     spread over all the SR members (e.g., a DHT-based                    ing resource allocation to the complex event processing and
     processing), or a hierarchical approach where a pre-                 applications both off-line and at run time through the use of
     processing is carried out by the SR members and a                    controllers and schedulers, and (iii) storing processing state
     selected processed information is then passed to the                 and data. In the following subsections we describe in more
     next layer of the hierarchy for further computations.                detail the layers of our framework.

Figure 1 depicts the SR abstraction. As shown in this Fig-                4.1    Semantic Room Management
ure, the SR abstraction supports the deployment of two                    This layer is responsible for supporting the SR abstraction
components termed Complex Event Processing and Appli-                     on the top of the individual processing and data sharing ap-
cations, and Communication which can vary from SR to SR                   plications provided by the Complex Event Processing and
depending on the software technologies employed to imple-                 Applications layer. It embodies a number of components
ment the SR processing and sharing logic, and a set of man-               fulfilling various SR management functions. Such functions
agement components, that together form the marble rectan-                 include the management of the entire SR lifecycle (i.e., cre-
gle in Figure 1 and that are exploited for SR management                  ation of an SR, instantiation of an SR, disband of an SR,
purposes (e.g., management of the membership, monitoring                  management of the SR membership), the registration and
of the SR operations). Section 4 describes in detail all these            discovery of SRs and SR contracts, the configuration and
components.                                                               planning of SRs, the management of the communications
                                                                          among different SRs, and the management of trust and rep-
SR members can inject raw data into the SR. Raw data may                  utation within SRs [2]. In addition, each SR member inter-
include real-time data, inputs from human beings, stored                  faces an SR through the use of a component of this layer
data (e.g., historical data), queries, and other types of dy-             termed SR gateway. This component transforms raw data
namic and/or static content that are processed in order to                into events in the format specified by the SR contract. In
produce complex processed data. Raw data are properly                     general, this transformation is necessary as depends on the
managed in order to satisfy privacy requirements that can                 specific SR objective to be met and comprises three dis-
be prescribed by the SR contract and that are crucial in or-              tinct pre-processing steps; namely, filtering, aggregation,
der to effectively enable a collaborative environment among                and anonymization. This latter consists of applying different
different, and potentially competitive, financial institutions.             anonymization techniques in case privacy and confidentiality
                                                                          requirements are prescribed by the SR contract.
Processed data can be used for internal consumption within
the SR: in this case, derived events, models, profiles, black-             4.2    Commodity services
lists, alerts and query results can be fed back into the SR               A number of services can be used by both layers previously
so that the members can take advantage of the intelligence                mentioned in order to provide functionalities such as commu-
provided by the processing (Figure 1). SR members can, for                nication, storage, resource and contract management, and
instance, use these data to properly instruct their local secu-           monitoring. These services are transversal to the other lay-
rity protection mechanisms in order to trigger informed and               ers and are described in isolation in the following, highlight-
timely reactions independently of the SR management. In                   ing several design alternatives available at the state of the
addition, a (possibly post-processed) subset of data can be               art and that can be used for their implementation.
offered for external consumption. SRs in fact can be willing
to make available for external use their produced processed               Storage Services. The Storage Service layer of the archi-
data. In this case, in addition to the SR members, there                  tecture consists of a collection of components providing var-
can exist clients of the SR that cannot contribute raw data               ious kinds of storage services to the other layers. This layer
directly to the SR but can simply consume the SR processed                can embody services for long term storage of large data sets,
data for external consumption (Figure 1). SR members have                 such as monitoring logs and historical data (HDFS [5] is an
full access to both the raw data members agreed to con-                   option we currently use), and for low latency storage of lim-
tribute to by contract, and the data being processed and                  ited amounts of real-time data (an option in this case we use
thus output by the SR. Data processing and results dissem-                is WebSphere XS [6]).
                            :!>(                      :!?(                     :!@(                 :!A(


                                                                                                                                    !"#$%&'"()*+((
                                                          :!(-)*)."/"*,(                                         :,$&)."( !"##$
                                                                                                                                      '$*,&)',(
                                                                                                                 :"&C3'"#( %&'()"%* -)*)."/"*,(




                                                                                                                                                        -",&3'#(-$*3,$&3*.(
                                                                                                       :!(
                                                 :!(                                 :!(                                     :,&")/3*.(     G$)+(
                        :!(D),"8)2(                           :!(!".3#,&2(                        L*M$&/)B$*(       K34"(        F(
                                               -)*)."&(                          9$**"'BC3,2(                                             H)4)*'3*.(
                                                                                                  -)*)."/"*,(     :2#,"/#(   -"##).3*.(

                                                                                                                                          5$&67$8(
                                      9$/=4"I(EC"*,(0&$'"##3*.()*+(<==43')B$*#(                                     L*N
                                                                                                                  /"/$&2(
                                                                                                                              0%;43#1N
                                                                                                                             :%;#'&3;"(
                                                                                                                                          -)*)."&(
                                                                                                                  #,$&)."(
                                                                             J*$84"+."(
                               EC"*,(                                                              5";(                                   04)'"/"*,(
                                                      <*)42B'#(<==#(         H)#"(-./,(
                          0&$'"##3*.(<==#(                                                         <==#(                      D&$%=(      9$*,&$44"&(
                                                                                <==#(                               OH(      9$//%*
                          EC"*,(=&$'"##3*.((                                                                                  3')B$*(
                                                              <*)42B'#(9$*,)3*"&(               5";(9$*,)3*"&(                            :'1"+%4"&#(
                             9$*,)3*"&(


                                                                                    012#3')4(!"#$%&'"#(


                    Figure 2: The framework for building collaborative protection environments


Communication. The Communication layer consists of                                                         • Mixed platform: the computational resources of each
a collection of components providing various kinds of com-                                                   SR are owned by the members of that SR. This plat-
munication services to the other layers. This layer can in-                                                  form runs the logic of its SR, but it can occasionally
clude large-scale group communication services [19], [18], re-                                               offer hosting services for running the logic of other SRs.
liable low-latency, high throughput message streaming ser-                                                   This case is allowed only after explicit request coming
vices that are useful for supporting real time event streaming                                               from another SR where its complex event processing
from within the external components into an SR, and pub-                                                     and applications exceed the capacity of its computa-
lish/subscribe services [21].                                                                                tional platform and implies some business (and trust)
                                                                                                             relationship among the involved SRs.
Resource and contract management. The Resource
and contract management layer is responsible for allocat-
ing physical resources (such as computational, storage, and
communication) to the SRs so as to satisfy the business ob-
jectives (such as performance goals, data and resource shar-                                         Metrics Monitoring. The Metric Monitoring layer is re-
ing constraints, etc.) prescribed by the SR contracts. A                                             sponsible for monitoring the architecture in order to assess
capacity planning study might be completed in the prelim-                                            whether the requirements specified by the SR contract are
inary phase of an SR startup. In such a way, the Resource                                            effectively met. As the Metrics Monitoring is a transversal
Management layer “can be aware” of the maximum capac-                                                layer of the framework, it operates at both the SR Manage-
ity that each SR has to provide in terms of computational                                            ment and Complex Event Processing and Applications lay-
power, throughput, memory and storage. To this end, this                                             ers. In particular, at the SR Management layer it is in charge
layer can include a scheduler and placement controller for                                           of periodically collecting monitoring information related to
initial allocation of the services to the physical resources, and                                    the management of SRs (e.g., SR membership information)
a runtime load balancer for possible dynamic re-allocations                                          in order to detect whether that management violates the
[30, 26].                                                                                            requirements included into SR contracts. The Metrics Mon-
                                                                                                     itoring keeps track of the dynamic behavior of the SRs and
As each SR can count on a set of (locally and geographically)                                        check whether or not SRs and SR members themselves are
distributed data and computational resources, we find con-                                            honoring their respective contracts. In case the Monitoring
venient to consider the following three alternatives for the                                         detects that SR contracts are close to be violated, it inter-
SR deployment that can be all three supported by our frame-                                          acts with SR Management components in order to trigger
work:                                                                                                proper reconfiguration activities.

   • SR-owned platform: the computational resources of                                               At the Complex Event Processing and Applications layer
     each SR are owned by its members, although one mem-                                             (see below), the Metrics Monitoring is in charge of period-
     ber is deputy as an SR administrator. The compu-                                                ically evaluating whether or not the resource management
     tational platform of an SR is fully dedicated to the                                            required by this layer is effectively able to support the execu-
     complex event processing and applications of that SR.                                           tion carried out within this layer. In addition, it is responsi-
                                                                                                     ble for detecting whether or not the processing execution vi-
   • Third party-owned platform: the computational resources                                         olates all those requirements specified into the SR contracts.
     are owned by a third party. The computational re-                                               The Metrics Monitoring uses “sensors”, possibly located at
     sources could be shared among the complex event pro-                                            physical resource and container levels, in order to obtain
     cessing and applications of the SRs. The collocation                                            the set of information required for enforcing the metrics of
     and data flows can be subject to some restrictions spec-                                         interest (in our current implementation we favored the use
     ified by the SR contract. This alternative corresponds                                           of Nagios monitoring technology [28] for metrics monitoring
     to the so called “SR as a service”.                                                             purposes).
4.3    Semantic Room Complex Event Process-                         processing and bandwidth capacities, thus limiting the sys-
       ing and Applications                                         tem scalability. Therefore, in our current implementation
This layer consists of applications implementing the data           of the architecture we favored the use of technologies that
processing and sharing logic required to support the SR             allow us to realize a decentralized complex event processing.
functionality. A typical application being hosted in an SR          In particular, we used both MapReduce [23] and DHT-based
will need to fuse and analyze large volumes of incoming             technologies for implementing the specific SR logics, as de-
raw data produced by numerous heterogeneous and possi-              scribed in the next section.
bly widely distributed sources, such as sensors, intrusion
and anomaly detection systems, firewalls, monitoring sys-
tems, etc. The incoming data will be either analyzed in             5.    SEMANTIC ROOM INSTANCES
real-time, possibly with assistance of analytical models, or        In order to show the flexibility of the proposed SR abstrac-
stored for the subsequent off-line analysis and intelligence         tion, we describe in this section two SRs that differ one
extraction. This suggests a characterization of the appli-          another for both the objective to fulfill and the deployed
cations supported by an SR, whose runtime instances can             software technologies used to implement the processing and
be hosted within various runtime container components. In           sharing logic and thus meet the objective. Specifically, we
particular the application containers, which can be either          introduce in the next subsections an SR for collaborative
standalone or clustered can be the following:                       intrusion detection which deploys a MapReduce [23]-based
                                                                    technology and an SR for collaborative Man-In-The-Middle
                                                                    detection which distributes the load among processing ele-
   • Event Processing Container: This container is respon-          ments through a DHT. The two SRs are at a different level
     sible for supporting event-processing applications in a        of maturity: the SR for intrusion detection has been success-
     distributed environment. The applications manipulate           fully deployed and tested, whereas the one for collaborative
     and/or extract patterns from streams of event data             Man-In-The-Middle detection is currently under implemen-
     arriving in real-time, from possibly widely distributed        tation and its complete development and testing is foreseen
     sources, and need to be able to support stringent guar-        by June 2010.
     antees in terms of the response time and/or through-
     put.
                                                                    5.1   Semantic Room for Collaborative Intru-
   • Analytics Container: This container is responsible for
     supporting parallel processing and querying massive
                                                                          sion Detection
                                                                    In this section, we discuss a specific instantiation of an SR,
     data sets on a cluster of machines. It will be used for
                                                                    to which we refer as ID-SR, whose objective is to prevent
     supporting the analytics and data warehousing appli-
                                                                    potential intrusion attempts by detecting stealthy port scan-
     cations hosted in the SR.
                                                                    ning activity. The subjects of the attack are the web servers
   • Web Container: This container will provide basic web           handling the external web connectivity of the participat-
     capabilities to support the runtime needs of the web           ing financial institutions. Those web servers typically run
     applications hosted within an SR. These applications           outside the corporate firewall (in DMZ), and are therefore,
     support the logic enabling the interaction between the         frequently targeted by the attackers. The goal of the attack
     client side presentation level artifacts (such as web          is to identify TCP ports that might have been left opened
     browser based consoles, dashboards, widgets, rich web          at the attacked subjects. The attack is carried out by initi-
     clients, etc.) and the processing applications.                ating a series of TCP connections to ranges of ports at each
                                                                    of the targeted DMZ servers. The ports that are detected as
                                                                    opened can be used as the intrusion vectors at a later time.
Different implementations of the Complex Event Processing
and Applications layer can be supported by our framework.           The attack detection is based on identifying patterns of un-
In particular, it can be possible deploying an SR that uses a       usually high number of TCP SYN requests possibly target-
central server for the implementation of both event process-        ing an unusually high number of ports, and originating from
ing and analytics containers. In this case, financial institu-       the same external IP address. The statistics are collected
tions of the SR send their own data to the central server (an       and analyzed across the entire set of the ID-SR participants,
example of a centralized event correlation engine that can be       thus improving chances of identifying low volume activities,
used in this case is Esper [9]). The central engine performs        which would have gone undetected if the individual partici-
the correlation and analysis of the data and sends back to          pants were exclusively relying on their local protection sys-
the financial institutions the generated processed data to let       tems. In addition, to minimize the amount of false positives,
each financial institution adopt its own countermeasures in          the real-time suspicions are periodically calibrated through
a timely fashion.                                                   a reputation system which maintains the site ranking based
                                                                    on the past history of the malicious activities originating
However, although this solution is fully supported by our           from those sites.
framework, it suffers from the inherent drawbacks of a cen-
tralized system. The central server may become a single             To support data analysis, we implemented a distributed
point of failure or security vulnerability: if the server crashes   event processing system, called Agilis, which is described
or is compromised by a security attack, the complex event           in the next section. The analysis logic employed by the ID-
processing computation it carries out can be unavailable or         SR is detailed in Section 5.1.2, and our initial experience
jeopardized. In addition, the volume of events the central          with using the Agilis-based ID-SR prototype is presented in
server can process in the time unit is limited by the server’s      Section 5.1.3.
                                                                    WXS Management
                                                                   Cat 1        Cat 2   Cat 2

                        SR Gateways


                            GW

                                                                                           MR
                            GW
                                                                           WX                        WX     Analysis
                                                             MR
                                                                            S                         S     Results
                            GW

                                                                                           MR
                                         WXS
                            GW                               HDF
                                                              S


                        Admin       Processing      Agilis            Hadoop/HDFS Management
                                    logic (Jaql),
                        Console     Config          front-
                                                     end                   JT      TT   ZK      CM



                                       Figure 3: MapReduce-based Semantic Room


5.1.1    The Agilis System                                                      through the optimized Hadoop scheduling framework. The
Agilis (see Figure 3) consists of a distributed network of pro-                 latter consists of a centralized Job Tracker (JT) which coor-
cessing and storage elements hosted on a cluster of machines                    dinates the local execution of mappers and reducers on each
allocated from the ID-SR hardware pool (as prescribed by its                    of the ID-SR nodes through a collection of Task Trackers
contract). The processing is based on the Hadoop’s MapRe-                       (TT) (one per machine).
duce framework [23]. The processing logic is specified in a
high-level language, called Jaql [12], which compiles into a                    Most of our scheduling optimizations were targeted at im-
series of MapReduce jobs. To improve detection latency, the                     proving locality of processing by scheduling the map tasks
mappers and reducers communicate through buffers stored                          close to the WXS partitions holding their respective in-
in the main memory storage system, called IBM WebSphere                         put splits. To match the input splits with the WXS par-
eXtreme Scale (WXS) [6]. The individual components of                           titions, we provided a new implementation of the Hadoop’s
the Agilis’ framework are illustrated in Figures 3 and 4, and                   InputFormat interface which was packaged with every Agilis’
described in detail below.                                                      MapReduce job submitted to JT. Subsequently, the getSplits
                                                                                method of this interface was used by JT to determine the
                                                                                split locations at runtime (which was obtained by interrogat-
WebSphere eXtreme Scale (WXS). WebSphere eXtreme                                ing the WXS Catalog service); and the createRecordReader
Scale (WXS) is a distributed main memory-based storage                          method to create an instance of RecordReader to read the
system implemented in Java. It allows the user data to                          data from the corresponding WXS partition. To further
be organized into a collection of maps consisting of either                     improve locality, our implementation of RecordReader rec-
relational records, or key-value pairs. At runtime, the data                    ognized the SQL select, project, and aggregate queries (by
are stored in Data Servers or containers hosted on a cluster                    interacting with the Jaql interpreter), and delegated their
of machines. The clients can query the stored data using                        execution to the SQL engine embedded into the WXS con-
either a simple get/set API, or full-blown SQL queries. The                     tainer.
queries can be executed either on the client, or within a
container using an embedded SQL engine.                                         In many cases, this approach resulted in a substantial re-
                                                                                duction in the volumes of intermediate data reaching the
For scalability, the map’s data can be broken into a fixed                       reducers thus improving latency, bandwidth utilization, and
number of partitions, which would then be evenly distributed                    reducing processing costs. It also allowed us to further en-
among the WXS containers by the WXS runtime. In addi-                           force privacy of the input data submitted by the individual
tion, for fault tolerance, each map partition can be repli-                     ID-SR members by scheduling the initial map processing
cated on a configured number of containers. The informa-                         on the machines residing within their administrative bound-
tion about the operational containers as well as the layout                     aries.
of hosted map partitions and their replicas is maintained
at runtime in the WXS Catalog service, which is typically
replicated for high availability.                                               Long-term data storage. Hadoop File System (HDFS) [5]
                                                                                is used to provide storage services for massive amount of
                                                                                data that should be preserved over time (such as e.g., his-
Processing framework. The processing is carried out on                          torical data keeping track of past attacks). The data stored
the machines within the ID-SR cluster, and orchestrated                         in HDFS can be injected into Hadoop through the provided
                                                Hadoop Job
                                                •!Jaql snippets for M & R
                                                •!Jaql interpreter
                                  Jaql          •!InputFormat, OutputFormat
                                  query
                                                                           Hadoop’s Map-Reduce

                       Client !      Jaql !                  Job !                     Task!           Task!           Task!
                                     Interp!               Tracker                     Tracker         Tracker         Tracker
                       machine       reter




                                                                             Agilis

                                                                   Distributed In-Memory Store (WXS)
                                                                                                                          Cat 1
                                                              Storage !              Storage !             Storage !
                                                              container              container             container

                                                                                                                          Cat 2




                                  Figure 4: The Components of the Agilis Runtime.


HDFS InputFormat implementation, and combined with the                     The summary records are then fed into the Blacklisting flow
WXS data using the Jaql I/O constructs. HDFS is managed                    (see Figure 5(b)), which will blacklist a source IP address
and kept consistent by the Hadoop’s Chunk Manager (CM),                    if the number of requests and distinct ports accessed from
and Zookeeper (ZK) services.                                               that IP address exceed fixed limits. In addition, the sum-
                                                                           mary records are also joined with the historical records of the
                                                                           form sourceIP, rank using sourceIP as a key to adjust
The Jaql language. The processing logic is expressed in                    the long-term rank representing the IP address threat level.
a high-level language, called Jaql. Jaql supports SQL-like                 The historical records are used to periodically calibrate the
query constructs that can be combined into flows. It can                    blacklist by excluding the IP addresses whose ranks fall be-
also interoperate with a large variety of data sources due to              low a fixed threshold.
its use of the standardized JSON data model. As shown in
Figure 4, in Agilis, the locally compiled Jaql flows are first                 5.1.3         The ID-SR Prototype
augmented with the input and output formats to interoper-
                                                                           In this section, we report on our initial experience with de-
ate with WXS, and then submitted to the modified Hadoop
                                                                           ploying and testing the ID-SR prototype on a small cluster
scheduler, which orchestrates their execution on the ID=SR
                                                                           of 8 Linux Virtual Machines (VMs), each of which equipped
machines as explained above.
                                                                           with 2GB of RAM and 20GB of disk space. Large scale ex-
                                                                           perimental study using PlanetLab[14] is currently in progress.
5.1.2    The ID-SR Processing Steps
The processing steps followed by the ID-SR implementa-                     The layout of the Agilis components on the cluster was as
tion are depicted in Figure 5(a). At the fist step, the raw                 follows: One of the VM’s was dedicated to host all of the
data capturing the current networking activity at each of                  Agilis management components: Hadoop Name Node, Job
the participating machines (as output by the tcpdump util-                 Tracker, and WXS Catalog Server. Each of the remaining
ity) is collected using the tcpdump utility and forwarded to               7 VM’s represented a single ID-SR participant, and hosted
the local ID-SR gateway. Each gateway will then normal-                    Data Node, Task Tracker, WXS Data Servers (containers),
ize the incoming raw data producing a stream of LogEvent                   and an external web server (which was the attack subject).
records of the form: sourceIP, destinationIP, sour-
cePort, destinationPort, bytesSent, bytesReceived,                         To assess the accuracy and timeliness of the port scan at-
returnStatus . The LogEvent records are stored in a WXS                    tack detection, we measured the Agilis performance in 3
partition hosted on a locally deployed WXS container.                      experimental scenarios. All scenarios involved a single in-
                                                                           truding host that generated a series of TCP/SYN requests
The incoming LogEvent records are then processed by a col-                 targeting a fixed set of 300 unique ports on each the 7
lection of MapReduce jobs handled by Agilis. The process-                  attacked servers. In each scenario, the requests were in-
ing logic consists of the following steps: First, the input                jected at constant rate which was set to 10, 20, and 30 re-
records are subjected to the Summarization flow (see Fig-                   quests/server/second for the first, second, and the third sce-
ure 5(b)) which consists of two processing steps surrounded                nario respectively. The ratio of the attack to the legitimate
by two I/O steps (for reading the input, and writing the                   traffic per server was 1:5 resulting in the total (malicious and
results). The outcome of the two processing steps is a col-                legitimate) incoming event traffic at each server of 60, 120,
lection of summary records of the form sourceIP, port-                     and 180 requests/second for each of the three scenarios. The
sNum, reqNum representing for each source IP address                       blacklisting threshold was set to 20,000 requests and 1000
(sourceIP) the number of distinct ports (portsNum) ac-                     unique ports. The Jaql flow in Figure 5 was compiled into
cessed from sourceIP along with the total number of re-                    3 MapReduce jobs whose total running time was 110 sec on
quests (reqNum) originating from sourceIP.                                 average. The jobs were executed periodically, once every 4
                                          Parallelized Map/Reduce Jobs                           Summarization:
                                                            Summarized Data:
                                                                                                    read($ogPojoInput(”LogEvents”,
                     WXS    Normalized                      [SourceIP,                                     ”dataObj.LogEvent”,
                            Data:                           rNum,
  TCPDump     GW1    Part                                   pNum]*                                         ”returnStatus = ’SYN’”))
                            [LogEvent]*
                      1                           Summ                     Blacklis     Black       → group by $ip port =
                                                  arizati                    ting                          {$.sourceIp, $.destPort, $.destIp} as $rs
                                                                                         List
                                                    on
                     WXS                                                                               into {$ip port.sourceIp, numReq: count($rs)}
  TCPDump
               GW2   Part        Summarized                                                         → group by $ip = {$.sourceIp}
                      2             Data
                                                                                                       into {ip: $ip.sourceIp, portsNum: count($),
                                                                                                           reqNum: sum($[*].numReq)}
                                                   Rankin
                                                                               Calibr               → write($ogPojoOutput(”LogEventsSum”,
                                                     g
                     WXS
                                                                               ation                       ”dataObj.LogEventSum”, ”ip”));
  TCPDump      GW3   Part                                                                        Blacklisting:
                      N        [SourceIP,rank]*
                                                                                                    read($ogPojoInput(”LogEventsSum”,
                                                                                                           ”dataObj.LogEventSum”,
                                                                                                           reqNum > 10000 AND portsNum > 200”))
                                              Historical                                            →transform $.ip
                                               Safety                                               →write($ogPojoOutput(”BlackList”,
                                              Ranking
                                                                                                           ”dataObj.HistoryData”, ”ip”));

                                            (a)                                                                        (b)

               Figure 5: Data Flow and Jaql Query Fragments Used for Port Scan Detection in ID-SR


minutes.                                                                                The event processing carried out in such SR can be sum-
                                                                                        marized as follows. There exist Processing Elements (PEs)
In all of our experimental runs, Agilis was able to absorb the                          that are arranged in a DHT, as shown in Figure 6. Each PE
entire volumes of the incoming event traffic without experi-                              implements the event processing container described in the
encing overload, and correctly blacklist the single intruding                           previous section. The container receives events from differ-
host (and none of the others). The detection times (i.e.,                               ent SR gateways. A single SR gateway pre-processes events
the time between the beginning of the scan and the intruder                             produced by raw sources of information that reside at a fi-
blacklisting) for the three scenarios were on average 700,                              nancial institution e.g., logs of Web or Application Server
430, and 330 seconds respectively. This indicates that Agilis                           technologies such as Tomcat, JBoss, deployed by an institu-
was able to take advantage of extra data points to improve                              tion in order to provide their customers with such financial
the detection latency. Also, note that since the total number                           services as e-banking, e-payment and e-trading. The raw
of unique ports accessed at each of the 7 attacked subjects                             data injected into the SR gateway are filtered, aggregated
in all three scenarios never exceeded 1000 (the detection                               and anonymized (if privacy requirements are mandated by
threshold), the ID-SR participants would have been never                                the SR contract). The output of the SR gateway is an event
able to correctly identify the intruding machine unless they                            which is injected into the SR. The format of the events is
cooperated and shared data through Agilis.                                               source ip, session id , where source ip is the address orig-
                                                                                        inating a connection to a financial institution server, and
                                                                                        session id is a concatenation of a unique identifier for the fi-
                                                                                        nancial institution offering a financial service, the requested
5.2         Semantic Room for Collaborative Man-In-                                     service URL and the user id that authenticated to the ser-
            the-Middle Detection                                                        vice. These information are sufficient in detecting MiTM
MitM makes a legitimate user start a connection with a mali-                            if there exists a suspicious number of connections originat-
cious server that mimics the legitimate server behavior. Dif-                           ing from the same source ip towards financial services with
ferent techniques can be used for obtaining such behavior:                              different user credentials, i.e., different session ids.
DNS cache poisoning [24], compromise of an intermediate
routing node [20], phishing [31], exploit vulnerabilities of                            Each PE is constructed out of three main subsystems (Fig-
authentication mechanisms [33]. In general, the malicious                               ure 6); namely, the event manager, overlay manager and
server stores the user credentials, relays them to the licit                            complex event processing engine. These subsystems perform
server, and forwards the response to the user on behalf of                              the following functions:
the licit server. The MitM node eavesdrops all data flow-
ing between the client and the server, thus accessing a great                              • the Event Manager receives events from the SR gate-
amount of sensitive information. MiTM attacks at a single                                    ways and submits the events to the Overlay Manager
node can be usually detected looking at anomalies in the                                     (see below). Moreover, it receives the results of the
statistics of Web and Application servers accesses, identi-                                  complex event processing from a CEP Engine imple-
fying an IP address that contacts ”too many” services with                                   mented by the PE, and disseminates the results (i.e.,
credential belonging to different users. The SR program-                                      alarms) to the SR gateways of the financial institutions
ming abstraction can then effectively used: it enables an                                     members of the SR.
aggregation and correlation of data logs coming from differ-
ent SR members that can reduce significantly the detection                                  • the Complex Event Processing (CEP) Engine processes
time of such attacks as the same IP address could also attack                                events received from the Overlay Manager and sub-
services of other members of the SR.                                                         mits the analysis results to the Event Manager. The
              Figure 6: The DHT-based Semantic Room and The Processing Element Architecture.


       processing required to identify MitM attacks is based      To this end, we have shown the design of two distributed
       on the computing of statistical anomalies on received      event processing schemes deployed within two different SRs;
       events. These anomalies can be abstracted by the fol-      namely, an SR that can be used for detecting stealthy scans
       lowing high level attack pattern: given a time win-        and an SR that can be used for detecting MiTM attacks.
       dows with m events, find at least n events with the         Specifically, the former relies on MapReduce technologies
       same source ip and different session ids. Currently,        for parallelizing the complex event processing computation
       this subsystem is implemented without using general        executed at geographical distributed financial sites, whereas
       purpose CEP engines available on the market. How-          the latter uses Distributed Hash Table technologies and gen-
       ever, we are evaluating its implementation with either     eral purpose CEP engines for carrying out the processing.
       JBoss Drools Fusion [13] or Esper [9].                     Both SRs enjoy large windows, in space and time, to de-
                                                                  tect stealthy scans and MiTM attacks ,correlating raw data
     • the Overlay Manager (OM) receives new events from          coming from SR members.
       the Event Manager and forwards them on the DHT,
       where the DHT key is obtained by hashing the source ip     Future work includes deploying and testing the MiTM SR
       address field of the event. The DHT we have used            and assessing performance of both SRs. At this aim we
       in our preliminary implementation is FreePastry [11],      are looking at a different direction: developing an Esper-
       since its routing based on proximity improves perfor-      based SR in order to compare the performances with also
       mance by exploiting locality of nodes.                     a centralized correlation engine, carrying out a quantitative
                                                                  experimental evaluation of both presented SRs using large
                                                                  scale platforms such as PlanetLab [14], building a testbed
Once the addresses of malicious servers have been identi-         on the Internet formed by three sites each one equipped
fied, they can be blacklisted and shared among SR members          with a cluster of machines where deploying the large scale
in order to block them, audit internal servers and identify       processing environment.
customers that may have suffered the attack, thus granting
better protection and security to clients transactions.           7.   ACKNOWLEDGMENTS
                                                                  We are indebted with Thomas Kohler (UBS), Finn Otto
                                                                  Hansen (SWIFT) and Guido Pagani (Bank of Italy) who
6.    CONCLUDING REMARKS                                          greatly helped us in better understanding strategies, con-
In this paper we have described a framework that enables the      straints and needs of financial players.
construction of collaborative security environments whose
aim is to defend financial institutions (e.g., banks, regula-
tory agencies) from coordinated Internet-based attacks. In        8.   REFERENCES
                                                                   [1] This reference is obscured in order to meet the double blind
doing so, the framework effectively supports a new program-             requirement for the review process.
ming abstraction named Semantic Room (SR). The SR al-              [2] This reference is obscured in order to meet the double blind
lows financial institutions to process data and share infor-            requirement of the review process.
mation and computing resources in a controlled, trusted and        [3] Basel II Accord. http://www.bis.org/bcbs/bcbscp3.htm,
secure manner. SRs are characterized by (i) an objective               2009.
(i.e., the SR functionality), (ii) a contract that specifies the    [4] DShield: Cooperative Network Security Community -
set of rules for governing the SR membership and QoS re-               Internet Security. http://www.dshield.org/indexd.html/,
                                                                       2009.
quirements (e.g., security, performance requirements), and
                                                                   [5] Hadoop-HDFS Architecture. http://hadoop.apache.org/
(iii) different software technologies that implement the spe-           common/docs/current/hdfs_design.html, 2009.
cific SR processing and sharing logic. Owing to this latter         [6] IBM WebSphere eXtreme Scale. http://www-01.ibm.com/
property, SRs are customizable: they can be instantiated               software/webservers/appserv/extremescale/, 2009.
in different ways, deploying various processing and sharing         [7] National Australia Bank it by DDoS attack.
schemes in order to match the needs of financial institutions.          http://www.zdnet.com.au/news/security/soa/
       National-Australia-Bank-hit-by-DDoS-attack/0,                       http://www.tricipher.com/landing_pages/spotlight_
       130061744,339271790,00.htm, 2009.                                   offer.html, 2010.
 [8]   Update: Credit card firm hit by DDoS attack.                  [32]   Y. Xie, V. Sekar, M. K. Reiter, and H. Zhang. Forensic
       http://www.computerworld.com/securitytopics/                        analysis for epidemic attacks in federated networks. In
       security/story/0,10801,96099,00.html, 2009.                         ICNP, pages 143–53, 2006.
 [9]   Where Complex Event Processing meets Open Source:            [33]   A. N. Klingsheim Y. Espelid, L. Netkand and K. J. Hole.
       Esper and NEsper. http://esper.codehaus.org/, 2009.                 Robbing banks with their own software - an exploit against
[10]   FBI investigates 9 Million ATM scam.                                Norwegian online banks. In IFIP 23rd International
       http://www.myfoxny.com/dpp/news/090202\_FBI\                        Information Security Conference, September 2008.
       _Investigates\_9\_Million\_ATM\_Scam, 2010.                  [34]   G. Zhang and M. Parashar. Cooperative detection and
[11]   Freepastry. http://www.freepastry.org/FreePastry/,                  protection against network attacks using decentralized
       2010.                                                               information sharing . Cluster Computing, 13(1):67–86,
[12]   Jaql. http://www.jaql.org/, 2010.                                   2010.
[13]   JBoss Drools Fusion.                                         [35]                                             g
                                                                           Xiaolan J. Zhang, Henrique Andrade, Bu˘ra Gedik,
       http://www.jboss.org/drools/drools-fusion.html, 2010.               Richard King, John Morar, Senthil Nathan, Yoonho Park,
[14]   PlanetLab. http://www.planet-lab.org/, 2010.                        Raju Pavuluri, Edward Pring, Randall Schnier, Philippe
[15]   Asaf Adi and Opher Etzion. Amit - the situation manager.            Selo, Michael Spicer, Volkmar Uhlig, and Chitra
                                                                           Venkatramani. Implementing a high-volume, low-latency
       VLDB J., 13(2):177–203, 2004.
                                                                           market data processing system on commodity hardware
[16]                        ¸
       Mert Akdere, Ugur Cetintemel, and Nesime Tatbul.                    using ibm middleware. In WHPCF ’09: Proceedings of the
       Plan-based complex event detection across distributed               2nd Workshop on High Performance Computational
       sources. PVLDB, 1(1):66–77, 2008.                                   Finance, pages 1–8, New York, NY, USA, 2009. ACM.
[17]   Lisa Amini, Navendu Jain, Anshul Sehgal, Jeremy Silber,      [36]   C. V. Zhou, S. Karunasekera, and C. Leckie. A peer-to-peer
       and Olivier Verscheure. Adaptive control of extreme-scale           collaborative intrusion detection system. In 13th IEEE
       stream processing systems. In ICDCS ’06: Proceedings of             International Conference on Networks, Kuala Lumpur,
       the 26th IEEE International Conference on Distributed               Malaysia, November 2005.
       Computing Systems, page 71, Washington, DC, USA, 2006.
                                                                    [37]   C. V. Zhou, C. Leckie, and S. Karunasekera. A survey of
       IEEE Computer Society.
                                                                           coordinated attacks and collaborative intrusion detection.
[18]   V. Bortnikov, G. V. Chockler, A. Roytman, and                       Computer and Security 29 (2010), pages 124–140, 2009.
       M. Spreitzer. Bulletin Board: A Scalable and Robust
       Eventually Consistent Shared Memory over a Peer-to-Peer
       Overlay. In ACM LADIS 2009, 2009.
[19]   G. V. Chockler, I. Keidar, and R. Vitenberg. Group
       communication specifications: a comprehensive study.
       ACM Computer Survey, 33(4):427–469, 2001.
[20]   T. Espiner. Symantec warns of router compromise.
       http://www.zdnetasia.com/news/security/0,39044215,
       62036991,00.htm, 2010.
[21]   P. T. Eugster, P.A. Felber, R. Guerraoui, and
       A. Kermarrec. The many faces of publish/subscribe. ACM
       Computer Survey, 35(2):114–131, 2003.
[22]   Y. Huang, N. feamser, and A. Lakhina nad J. J. Xu.
       Diagnosing network disruptions with network-wide
       analysis. In SIGMETRICS’07, San Diego, California, USA,
       12-16 June 2007.
[23]   Dean Jeffrey and Sanjay Ghemawat. MapReduce:
       simplified data processing on large clusters. Commun.
       ACM, 51(1):107–113, 2008.
[24]   A. Klein. BIND 9 DNS cache poisoning. http://www.
       trusteer.com/files/BIND_9_DNS_Cache_Poisoning.pdf,
       2010.
[25]   M. E. Locasto, J. J. Parekh, A. D. Keromytis, and S. J.
       Stolfo. Towards collaborative security and p2p intrusion
       detection. In IEEE Workshop on Information Assurance
       and Security, United States Military Academy, West Point,
       NY, 15-17 June 2005.
[26]   G. Lodi, F. Panzieri, D. Rossi, and E. Turrini. SLA-Driven
       Clustering of QoS-aware Application Servers. IEEE
       Transaction on Software Engineering, 33(3), 2007.
[27]   P. Poncelet N. Verma, F. Trousset and F. Masseglia.
       Intrusion Detections in Collaborative Organizations by
       Preserving Privacy. In Advances in Knowledge Discovery
       and Management, December 2009.
[28]   Nagios. Nagios. http://www.nagios.org, 2010.
[29]   P. R. Pietzuch. Hermes: A Scalable Event-Based
       Middleware. In Ph.D. Thesis, University of Cambridge.
[30]   C. Tang, M. Steinder, M. Spreitzer, and G. Pacifici. A
       Scalable Application Placement Controller for Enterprise
       Data Centers. In 16th international Conference on World
       Wide Web, 2007.
[31]   TriCipher. The perfect storm: Man in the middle attacks,
       weak authentication and organized online criminals.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:9/21/2011
language:English
pages:10