my-sec-anal-correlation by anton1chuvakin


More Info
									   ACCESS             CONTROL                   SYSTEMS                         AND       METHODOLOGY

Security Event Analysis
through Correlation

Anton Chuvakin, Ph.D., GCIA, GCIH

        he security spending survey by                           All the above devices, whether aimed at

T       Information Security (http://www.
coverstory.pdf) and recent research by For-
                                                              prevention or detection of attacks, usually
                                                              generate huge volumes of audit data. Fire-
                                                              walls, routers, switched, and other devices
rester indicate that deployment rates of                      recording network connection information
many security technologies will soar in the                   are especially guilty of producing vast
next three years. According to some esti-                     oceans of data.
mates, security budgets (and thus technol-                       There are other problems induced by this
ogy purchases) will double by 2006.                           log deluge, turning its analysis into a pursuit
                                                              few dare to undertake. Many diverse data
INTRODUCTION TO SECURITY                                      formats and representations, some binary,1
DATA ANALYSIS                                                 obscure, and undocumented, are used for
Almost every Internet-connected organiza-                     those log files and audit trails. Also, a per-
tion now has a firewall included as part of its               centage of events generated by network
network infrastructure; most Windows net-                     IDSs and intrusion prevention systems
works have an anti-virus solution. Intrusion                  (IPSs) are false alarms and do not map to
detection systems (IDS) are slowly but                        real threats or map to threats that have no
surely gaining wider acceptance, and intru-                   chance of causing loss.
sion prevention is starting to show more                         To further confuse the issue, different
promise despite the obvious hurdles. New                      devices might report on the same things hap-
types of application security products such                   pening on the network, but in a different way,
as Web application firewalls are starting to                  with no apparent way of figuring the truth of
be deployed by security-conscious organi-                     their relationship. For example, a UNIX log
zations. This buying trend is further                         file might contain an FTP connection mes-
enhanced by the growing popularity of so-                     sage. The same will also be recorded by the
called “appliance” security systems, which                    firewall as “connection allowed to TCP port
are very easy to install and manage. Appli-                   21.” A network IDS might also generate an
ances combine software and hardware in                        alert, warning that FTP with no password has
one package and usually have much lower                       occurred. All three messages refer to the same
installation and maintenance costs, thus                      event, and a human analyst will recognize
facilitating their adoption.                                  them as such.

ANTON CHUVAKIN, Ph.D., GCIA, GCIH, is a Senior Security Analyst at a major security company. His
areas of infosec expertise include intrusion detection, UNIX security, forensics, honeypots, etc. In his
spare time he maintains his security portal at

                  A C C E S S   C O N T R O L   S Y S T E M S           A N D   M E T H O D O L O G Y           13
                                                M A Y / J U N E   2 0 0 4
                               However, programming a system to do                    positives”). Moreover, sometimes the threat
                            that is much more challenging, especially                 can only be identified and rated by cross-
                            for a broad spectrum of messages. Thus,                   device and cross-category analysis of the
                            there is a definite need for a consistent anal-           above events.
                            ysis framework to identify various network                   Many questions arise upon seeing the
Statistical correlation
                            threats, prioritize them, and learn their                 above data. How do you turn that flood of
does not employ any
preexisting knowledge       impact on the target organization. This must              data into useful and actionable information?
of the “bad” activity,      be done as fast as possible (preferably in                How do you find what is really relevant for
but instead relies on the   real-time) for attack identification and also             the organization at the moment and for the
knowledge of normal         over the long term for threat trending and                near future? How do you tell normal log
activities, accumulated     risk analysis.                                            records, produced in the course of business,
over time.                     To understand the meaning of the piling                from the anomalous and malicious, pro-
                            logs, the data in them can be categorized in              duced by attackers or misbehaving soft-
                            several ways. It should be noted that before              ware?
                            the data can be intelligently categorized, it                Correlation performed by SIM (Security
                            should be normalized to a common schema.                  Information Management) software is
                            The normalization process involves extract-               believed to be the solution to those chal-
                            ing the parts of the log records serving the              lenges. Correlation is defined in the dictio-
                            common purpose and assigning them to spe-                 nary as establishing or finding relationships
                            cific fields in the common schema. For                    between entities. However, a good security-
                            example, both firewall and network IDS log                specific definition is lacking. In security,
                            records will usually contain the source and               “event correlation” can be defined as
                            destination IP addresses. If you see both                 improving the threat identification and
                            firewall and IDS logs referring to the same               assessment process by looking not only at
                            source and destination at about the same                  individual events, but also at their sets,
                            time, they are likely related.                            bound by some common parameter
                               Log categorization helps make the simi-                (“related”).
                            larity between different log records stand
                            out. For example, the generated log data                  TYPES OF CORRELATION
                            across many security devices, hosts, and                  Security-specific correlation can be loosely
                            applications might be related to:                         categorized into rule-based and statistical
                            ■   Device performance data                               (or algorithmic). Rule-based correlation
                            ■   Network traffic                                       needs some preexisting knowledge of the
                            ■   Known attacks                                         attack (“the rule”) and is able to define what
                            ■   Known network/system problems                         it actually detected in precise terms (“Suc-
                            ■   Anomalous/suspicious network/host                     cessful Shopping Cart Web Application
                                activity                                              Attack”). Such attack knowledge is used to
                            ■   Access control decisions                              relate events and analyze them together in
                            ■   Software failures                                     broader context.
                            ■   Hardware errors                                          On the other hand, statistical correlation
                            ■   System changes                                        does not employ any preexisting knowledge
                            ■   Evidence of malicious agents                          of the “bad” activity (at least not as a pri-
                            ■   Site-specific AUP2 violations                         mary detection vehicle), but instead relies
                                                                                      on the knowledge of normal activities, accu-
                               Each of the above types of events pre-                 mulated over time. Ongoing events are then
                            sents unique analysis challenges. For exam-               rated by the built-in algorithm and are addi-
                            ple, some are produced in much higher                     tionally compared to the accumulated activ-
                            numbers (network access control, worm                     ity patterns.
                            events) while others are often not what they                 This distinction is somewhat similar to
                            seem at first (such as network IDS “false                 signature versus anomaly IDS and makes

          14                                          I N F O R M A T I O N    S Y S T E M S        S E C U R I T Y
                                                                        M A Y / J U N E   2 0 0 4
the SIM solution a kind of meta-IDS, oper-                   applies the relevant correlation rules as
ating on higher-level data (not packets, but                 needed. The correlation engine also lever-
log records). Both correlation methods com-                  ages other types of available data (such as
bined can help to sift through the large vol-                vulnerability, open port, or asset business
ume of diverse data and identify high-                       value information) for a higher level of cor-
severity threats.                                            relation.
                                                                Correlation rules can be applied to the
Rule-Based Correlation                                       incoming events as they arrive in real-time
Rule-based correlation uses some preexist-                   or to the historical events stored in the data-
ing knowledge of an attack (a rule), which is                base. In the latter case, the rules are used as
essentially a scenario that an attack must                   a form of data mining or analytics, which
follow to be detected. Such a scenario might                 allows for uncovering hidden threats such as
be encoded in the form of “if this, then that,               slow port scans or low-level Trojan or
therefore some action is needed.”                            exploitation activity. Such rules can be run
   Rule-based correlation deals with states,                 periodically for incident identification or in
conditions, timeouts, and actions. Let us                    the course of the investigation of suspicious
define these important terms. A state is a                   activity for seeking out the prior occur-
stationary occurrence that the correlation                   rences of similar (and thus possibly related)
rule might be in. A state might contain vari-                activity. Unlike the real-time rules, which
ous conditions, such as matching incoming                    become useless if prone to false alarms (just
events by the source IP address, protocol,                   as signature-based IDSs sometimes are),
port, event type, producing security device                  database rules can tolerate a certain level of
type, username, and other components of                      false alarms for the purpose of drastically
the event. It should be noted that although                  reducing false negatives. This is due to the
such data components vary with the device,                   fact that real-time rules usually feed the
the SIM solution normalizes them using the                   alarm notification system, while database
cross-device event schema without incur-                     rule correlation will be launched by the ana-
ring the information loss. Timeout defines                   lyst during the security incident investiga-
how long the rule will be in a certain state.                tion. As long as the rule-based analytics will
If the correlation engine has to maintain a                  uncover a hidden threat, which is impossi-
lot of rules in waiting state in memory, this                ble to discover otherwise, an analyst might
resource might be exhausted. Thus, rule                      be able to tolerate a certain level of false
timeouts play an important role in correla-                  alarms not acceptable for the real-time cor-
tion performance. A transition is an event                   relation.
when one rule state is switched to another
one. For a complicated rule, many transi-                    Statistical Correlation
tions are possible. Action is what happens                   Statistical correlation uses special numeric
when all the rule conditions are met. Vari-                  algorithms to calculate threat levels
ous actions can result from rules, such as                   incurred by the security-relevant events on
user notification, alarm escalation, configu-                various IT assets. Such correlation looks for
ration changes, or automatic incident case                   deviations from normal event levels and
investigation.                                               other routine activities. Risk levels can be
   The correlation is usually performed by                   computed from the incoming events and
the correlation engine, which is able to track               then tracked in real-time or historically so
various states and switch from state to state,               that deviations are apparent. The algorith-
depending on conditions and incoming                         mic correlation can leverage the event cate-
events. It does all the above for multiple                   gorization in order to compute the threat
rules at the same time. The correlation                      levels specific to various attack types, such
engine gets a real-time event feed from the                  as a threat of denial-of-service, a threat of
alarm-generating security devices and                        viruses, etc., and track them over time.

                 A C C E S S   C O N T R O L   S Y S T E M S           A N D   M E T H O D O L O G Y           15
                                               M A Y / J U N E   2 0 0 4
                             Detecting threats using statistical correla-            network IDS that needs a specific signature
                          tion does not require any preexisting knowl-               with detailed knowledge of the attack, a cor-
                          edge of the attack to be detected. Statistical             relation system rule might cover the broad
                          methods can, however, be used to detect                    range of malicious activities, especially if
                          threats on predefined activity thresholds.                 intelligent security event categorization is
Detecting threats using
                          Such thresholds can be configured based on                 utilized. This can be done without going
statistical correlation
does not require any      the experiences monitoring the environ-                    into the specifics of a particular IDS signa-
preexisting knowledge     ment. For example, if a normal level of spe-               ture. For example, rules can be written to
of the attack to be       cific reconnaissance activity is exceeded for              look for certain activities that usually
detected.                 a prolonged period of time, the alarm might                accompany the system compromise, such as
                          be generated by the system.                                backdoor communication or hacker tools
                             Correlation can also use various parame-                download. Doing those things is more diffi-
                          ters for enterprise assets to skew the statisti-           cult for the attacker to avoid if he intends to
                          cal algorithm for higher accuracy detection.               use the compromised machine for his own
                          Some of them are defined by system users                   purposes. Extensive research using decep-
                          (such as the affected asset value to the orga-             tion networks (also called honeynets)
                          nization) or are automatically computed                    allows one to learn more and more about the
                          from other available event context data                    attacker’s patterns of behavior and to
                          (such as vulnerability scanning results or                 encode them as correlation rules, available
                          measure of normal user activity on the                     out of the box.
                          asset). That allows one to define a broader                   Second, can multiple rules cause the
                          context for transpiring security events and                number of false positives to actually
                          thus helps one understand how they contrib-                increase instead of decrease? Indeed,
                          ute to the organization’s risk posture.                    deploying many rules without any regard
                             If rule-based correlation is more helpful               for the environment might generate false
                          during threat identification, then algorith-               alarms. However, it is much easier to under-
                          mic correlation is conducive to impact                     stand and tune the SIM correlation rules
                          assessment. In the case of higher threat lev-              than intricate binary matching patterns. The
                          els detected by the algorithms, one can                    latter requires an in-depth understanding of
                          assume that there is a higher chance of cata-              the attack network packets, memory corrup-
                          strophic system compromise or failure. Var-                tion issues, and the specifics of the exploita-
                          ious statistical algorithms can be used to                 tion techniques. On the other hand, tuning
                          trend such threat levels over long periods of              the correlation rule involves changing the
                          time to gain awareness of the normal net-                  timeouts and adding or removing condi-
                          work and host activities. The accumulated                  tions. Overall, in the case of correlation
                          threat data is then used to compare the cur-               rules, one can also define response actions
                          rent patterns of activity with the baseline.               with higher confidence because one can
                          This allows the system to make accurate                    bind the rules to a specific asset or group of
                          (and possibly automated) decisions about                   assets.
                          event flows and their possible impact.                        Third, rule-based correlation is relatively
                                                                                     intensive computationally. However, using
                          Challenges with Correlation                                highly optimized correlation engines and
                          Both of the above types of correlation have                intelligently applying filters to limit the
                          inherent challenges, which can fortunately                 flow of events allows one to gain maximum
                          be mitigated by combining both methods to                  advantage of the rule-based correlation.
                          create coherent correlation coverage, leading              Additionally, many rules can be combined
                          to quality threat identification and ranking.              so that the correlation engine does not have
                             First, can we assume that the attacker will             to keep many similar events in memory. It
                          follow a scenario that can be caught by the                also makes sense to apply more specific cor-
                          rule-based correlation system? Unlike the                  relation rules to a large number of assets,

         16                                          I N F O R M A T I O N    S Y S T E M S        S E C U R I T Y
                                                                       M A Y / J U N E   2 0 0 4
where a false positive flood might endanger                  ■ Use the statistical correlation to learn the
the security, and to apply wider and more                      threats and then deploy new rules for site-
generic rules to critical assets, where an                     specific and newly discovered violations
occasional false alarm is better than missing
                                                             Overall, combining rules and algorithms
a single important alert. In this way, all the
                                                             provides the best value for managing an
suspicious activities directed against a small
                                                             organization’s IT security risks.
group of critical assets will be detected, and
   Fourth, statistical correlation might not
pick up anomalous activity if it is performed                CORRELATION RULE EXAMPLES
                                                             Probes Followed by an Attack
at low enough levels, essentially merging
with the normal. Hiding attack patterns                      The rule watches for the general attack pat-
                                                             tern consisting of a reconnaissance activity,
under volumes and volumes of similar nor-
                                                             followed by the exploit attempt. Attackers
mal activity might deceive the statistical
                                                             often use activities such as port scanning or
correlation system. Similarly, a single
                                                             application querying to scope the environ-
occurrence of an attack might not impact the
                                                             ment and find targets for exploitation and
statistical profile enough to be noticed.
                                                             get an initial picture of system vulnerabili-
However, careful “baselining” of the envi-                   ties. After performing the initial information
ronment and then using statistical methods                   gathering, the attacker returns with exploit
to track the deviations from such a baseline                 code or automated attack tools to obtain
might allow one to detect some of the low-                   actual system penetration. The correlation
volume threats. Also, rule-based correlation                 enriches the information reported by the
efficiency compensates for those rare events                 IDS and serves to validate the attack and
and enables their detection, even if algorith-               suppress false alarms. By watching for
mic correlation misses them.                                 exploit attempts that follow the reconnais-
                                                             sance activity from the same source IP
MAXIMIZING THE BENEFITS                                      address against the same destination
OF CORRELATION                                               machine, the SIM solution can increase both
Correlation enables system users to take the                 the confidence and accuracy of reporting.
audit data analysis to the next level. Rule-                    After the reconnaissance event is
based and statistical correlation allows the                 detected by the system, the rule activates
user to:                                                     and waits for the actual exploit to be
                                                             reported. If it arrives within a specified
■ Dramatically decrease the response times                   interval, the correlated event is generated.
  for routine attacks and incidents using the                The notification functionality can then be
  centralized and correlated evidence stor-                  used to relay the event to security adminis-
  age                                                        trators by email, pager, and cell phone or to
■ Completely automate the response to cer-                   invoke appropriate actions.
  tain threats that can be detected reliably
  by correlation rules                                       Login Guessing
■ Identify malicious and suspicious activi-                  The rule watches for multiple attempts of
  ties on the network even without having                    failed authentication to network and host
  any preexisting knowledge of what to                       services followed by a successful log-in
  look for                                                   attempt. While some intrusion detection
■ Increase awareness of the network via                      systems are able to alert on failed log-in
  baselining and trending and effectively                    attempts, the correlation system is able to
  “take back your network”                                   analyze such activity across all authenti-
■ Fuse data from various information                         cated services, both networked (such as Tel-
  sources to gain a cross-device business                    net, SSH, FTP, Windows access, etc.) and
  risk view of the organization                              local (such as UNIX and Windows console

                 A C C E S S   C O N T R O L   S Y S T E M S           A N D   M E T H O D O L O G Y          17
                                               M A Y / J U N E   2 0 0 4
                        log-ins). This rule is designed to track suc-             the count and the interval for the environ-
                        cessful completion of such an attack. Trig-               ment. Up to three failed attempts within sev-
                        gering of this rule indicates that an attacker            eral minutes is usually associated with users
                        managed to log in to one of your servers.                 trying to remember the forgotten password,
                           It is well-known that system users would               while higher counts within a shorter period
Intelligent automated
                        often use passwords that are easy to guess                of time might be more suspicious and indi-
guessing tools,
                        from just several tries. Intelligent automated            cate a malicious attempt or a script-based
available to hackers,
allow them to cut the
                        guessing tools, available to hackers, allow               attack.
guessing time to a      them to cut the guessing time to a minimum.
minimum.                The tools use various tricks such as trying to            CONCLUSION
                        derive a password from a user’s log-in                    SIM products leveraging advanced correla-
                        name, last name, etc. In the case that those              tion techniques and intelligent alert catego-
                        simple guessing attempts fail, hackers might              rization are becoming indispensable as
                        resort to “brute-forcing” the password. This              enterprises deploy more and more security
                        technique uses all possible combinations of               point solutions, appliances, and devices.
                        characters (such as letters and numbers) to               Those solutions alone only address small
                        try as a password. After the non-root (non-               parts of a company’s security requirements
                        administrator) user password is successfully              and need to be integrated under the umbrella
                        obtained, the attacker will likely attempt to             of a Security Information Management
                        escalate privileges on the machine to                     solution, which will enable the users to
                        achieve higher system privileges.                         combat modern-day technology threats such
                           The rule activates after the first failed              as hackers, hybrid worms, and even internal
                        attempt is detected. The event counter is                 abuse.
                        then incremented until the threshold level is
                        reached. At that point, the rule engine will be
                        expecting a successful log-in message. In
                                                                                      1. Binary = here, not containing human-readable
                        case such message is received, the correlated                    text, but binary data.
                        event is sent. It is highly suggested to tune                 2. AUP = Acceptable Use Policy.

         18                                       I N F O R M A T I O N    S Y S T E M S        S E C U R I T Y
                                                                    M A Y / J U N E   2 0 0 4

To top