Docstoc

Data Mining in Intrusion Detection - PowerPoint - PowerPoint

Document Sample
Data Mining in Intrusion Detection - PowerPoint - PowerPoint Powered By Docstoc
					Areej Al-Bataineh
    Data Mining Basics
      Definition
      Some techniques
             Association Rules
             Classification
             Clustering

    Data mining meets Intrusion Detection
      Detection Approaches
      Data mining use in IDS
      Case Study
             Behavioral Feature for Network Anomaly Detection
      Conclusions
4/26/2012                         Data Mining in Intrusion Detection   2
    Knowledge Discovery in Databases (KDD)
      “Process of extracting useful information from large databases”

    KDD basic steps
     1.       Understanding the application domain
     2.       Data integration and selection
     3.       Data mining
     4.       Pattern Evaluation
     5.       Knowledge representation

           Related Fields
             Machine learning, statistics, others


4/26/2012                      Data Mining in Intrusion Detection        3
    “concerned with uncovering patterns, associations, changes, anomalies,
     and statistically significant structures and events in data”
    Why Data Mining?
      Understand existing data
      Predict new data

    Components
      Representation
            ▪ Decide on what model can we build.
            ▪ Model is a compact summary of examples.

      Learning Element
            ▪ Builds a model from a set of examples

      Performance Element
            ▪ Applies the model to new observations


4/26/2012                             Data Mining in Intrusion Detection      4
    Well-known and used in Intrusion Detection
      Association Rules [Descriptive]
      Classification [Predictive]
      Clustering [Descriptive]

    Preliminary step
      Raw Data  Database Table (Training set)
      Columns – Attributes
      Rows - Records

4/26/2012             Data Mining in Intrusion Detection   5
    Motivated by market-basket analysis

    Generate Rules that capture implications between
     attribute values

    Rule Example
      Lettuce & Tomato -> Salad Dressing [0.4, 0.9]

    Parameters [s, c]
      Support (s) % records satisfy LHS and RHS
      Confidence (c) = P(satisfies RHS | satisfies LHS)

    Mining Problem
      “Find all association rules that have support and
       confidence > user-defined minimum value”
4/26/2012                    Data Mining in Intrusion Detection   6
    Predefined set of classes
    Training set has Class as one of the attributes
      Supervised Learning

    Mining Problem
      “Find a model for class attribute as a function of the values of other
            attributes”

    Use model to predict class
       for new records
    Classifier representation
      If-then Rules
      Decision Trees


4/26/2012                    Data Mining in Intrusion Detection                 7
       Given Data Set and Similarity Measure
            Unsupervised Learning

       Mining Problem
            “Group records into clusters such that all records within a cluster are more similar to one
             another . And records in separate clusters are less similar another”

       Similarity Measures:
            Euclidean Distance if attributes are continuous.
            Other Problem-specific Measures.

       Clustering Methods
            Partitioning
             ▪ Divide data into disjoint partitions
            Hierarchical
             ▪ Root is complete data set, Leaves are individual records, and Intermediate layers -> partitions
4/26/2012                                  Data Mining in Intrusion Detection                                    8
    Detection Approach
      Misuse Detection
            ▪ Based o known malicious patterns
              (signatures)
      Anomaly Detection
            ▪ Based on deviations from established
              normal patterns (profiles)

    Data Source
      Network-based (NIDS)
            ▪ Network traffic
      Host-based (HIDS)
            ▪ Audit trails

4/26/2012                       Data Mining in Intrusion Detection   9
    Signature extraction
    Rule matching
    Alarm data analysis
      Reduce false alarms
      Eliminate redundant alarms


    Feature selection
    Training Data cleaning

4/26/2012            Data Mining in Intrusion Detection   10
    Behavioral Feature for Network Anomaly
     Detection
           Training set = normal network traffic
           Feature provides semantics of the values of data
           Feature selection is important
           Proposed method:
            ▪ Feature extraction based on protocol behavior
            ▪ Many Attacks uses protocol improperly
              ▪ Ping of Death
              ▪ SYN Flood
              ▪ Teardrop

4/26/2012                       Data Mining in Intrusion Detection   11
    Attributes
           packet header fields

    Feature
      Single or multiple attributes

    Protocol Specifications
      Policy for interaction
      Define attributes and the range of values

    Flow
      Collection of packets exchanged between entities engaged in
       protocol
      Client/Server flows
4/26/2012                     Data Mining in Intrusion Detection     12
   Inter-Flow vs Intra-Flow Analysis (IVIA)
   First step
     Identify attributes used in partitioning traffic data into flows -> Src/Dst ports
     Result: HTTP flows, DNS flows, …etc

   Next Step
     Examine change of attribute values
            ▪ Between flows (inter-flow)
            ▪ Within a flow (intra-flow)

   Results                                                              Intra-Flow Changes
    Operationally
    Variable                                                             Yes                          No
    Attributes                             Yes               IHL                  Flags_MF         Source Add
                                                        Service Type           Fragment Offset   Destination Add
    Flow                   Inter-flow                   Total Length             Time to Live       Protocol
    Descriptors             Changes                     Identification             Options
                                                          Flags_DF
    Operationally
                                                                                                     Version
    Invariant                              No                                                    Flags_reserved

4/26/2012                               Data Mining in Intrusion Detection                                         13
    Uses 1999 DARPA IDS Evaluation data set
    Build association rules for IP fragments using OVAs
    Result - Top 8 ranking rules
                                   Rule                                      Support   Strength
            ipFlagsMF =1 & ipTTL = 63  ipTLen = 28                           0.526     0.981
            ipID < 2817 & ipFlagsMF = 1  ipTLen > 28                         0.309     0.968
            ipID < 2817 & ipTTL > 63  ipTLen > 28                            0.299     1.000
            ipTLen > 28  ipID < 2817                                         0.309     1.000
            ipID < 2817  ipTLen > 28                                         0.309     0.927
            ipTTL > 63  ipTLen > 28                                          0.299     0.988
            ipTLen > 28  ipTTL > 63                                          0.299     0.967
            ipTLen > 28 & ipOffset > 118  ipTTL > 63                         0.291     1.000

4/26/2012                               Data Mining in Intrusion Detection                        14
    Transform OVAs into features that capture the protocol behavior
    Behavior features
      Attribute observed over time/event

    For an attribute observe
           Entropy
           Mean and standard deviations
           Parentage of event within value
           Percentage of events are monotonic
           Step size in attribute value

    Training data requirement are reduced
    Normal – acceptable uses of the protocol

4/26/2012                       Data Mining in Intrusion Detection     15
    Uses aggregate attribute values for some window of packets
           Window size = 10
           Examples
            ▪   TcpPerFIN = % of packets with FIN set
            ▪   meanIAT = Mean inter-arrival time

    50 flows for each protocol = 250 flows
    Number of packets per flow (5 – 37000)
    Use decision tree classifier (C5)
            ▪   FTP, SSH, Telent, SMTP, HTTP

    Classifier tested on DARPA data set
           FTP             SSH          Telnet         SMTP         WWW
           100%            100%         100%           82%          98%

    Real Network Traffic (85% - 100%)
           Kazaa
           100 %


4/26/2012                                   Data Mining in Intrusion Detection   16
                              <=0.4    WWW
                  tcpPerPSH                          <=0.79         SMTP
          >0.01               >0.4
                                      tcpPerPSH
                                                     >0.79           FTP
tcpPerFIN
                                                  <=0.03       telnet                         >0.79   SMTP
        <=0.01
                            >546773   tcpPerSYN                             >73   tcpPerPSH
                  meanIAT                          >0.03     meanipTLen                               …
                         >546773      …                                    <=73
                                                                                  …



  4/26/2012                           Data Mining in Intrusion Detection                                     17
    Behavioral Features for Network Anomaly Detection
           Attribute values cannot be used as features
           Interpretation of protocol specifications
           Transform attributes into behavior features
           aggregation of the attribute values

    Data Mining Challenges
           Self-tuning data mining techniques
           Pattern-finding and prior knowledge
           Modeling of temporal data
           Scalability
           Incremental mining
4/26/2012                    Data Mining in Intrusion Detection   18
    Tools
      Kdnuggets
            ▪ Web portal http://www.kdnuggets.com

      WEKA
            ▪ Most comprehensive and free collection of tools
            ▪ http://www.cs.waikato.ac.nz/ml/weka

    Data Sets
      Machine Learning Database Repository
      Knowledge Discovery in Databases Archive
            ▪ http://kdd.ics.uci.edu

      MIT Lincolin Labs
            ▪ http://www.ll.mit.edu/IST/ideval


4/26/2012                              Data Mining in Intrusion Detection   19
    “Applications of Data Mining in Computer Security” By Barbara
     and Jajodia
    “Machine Learning and Data Mining for Computer Security” By
     Maloof
    “Data Mining: Challenges and Opportunities for Data Mining
     During the Next Decade” By Grossman
    “Data Mining: Concepts and Techniques” By Han and Kamber
    SANS IDS FAQs
      https://www2.sans.org/resources/idfaq/

    ACM Crossroads: IDS
      http://www.acm.org/crossroads/xrds2-4/intrus.html

4/26/2012                  Data Mining in Intrusion Detection        20
    OLD
           Represent rules as a decision tree in memory
           Very inefficient
           Speed is linear in term of number of rules
           Rules growing fast
    New
           Multi-pattern search algorithm
           Apply multiple rules in parallel
           Set-wise methodology
           Fire rule with the longest match
4/26/2012                   Data Mining in Intrusion Detection   21

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:42
posted:4/26/2012
language:Latin
pages:21