Misleading Worm Signature Generators Using Deliberate Noise Injection

Document Sample
Misleading Worm Signature Generators Using Deliberate Noise Injection Powered By Docstoc
					Misleading Worm Signature
     Generators Using
Deliberate Noise Injection
Roberto Perdisci+^, David Dagon^, Wenke Lee^,
        Prahlad Fogla^, Monirul Sharif^
  ^
      Georgia Institute of Technology, Atlanta, GA, USA
               +
                 University of Cagliari, ITALY

                   Presented by: Roberto Perdisci



                                                          1
Outline
   Introduction
   Syntactic Worm Signature Generators
   “Traffic-based” flow classifiers
   Noise Injection Attack
   Case study: Misleading POLYGRAPH
   Experimental Results
   Conclusion
                                          2
    Introduction
   Automatic signature generators look for invariant
    parts of polymorphic worms
        Syntax-based: Network packets/flows content inspection
                               worm



           GET .* HTTP/1.1 .* Host: .* Host: .* \xFF\xBF .*


   Our contribution:
        Syntactic signature generators are vulnerable to
         Noise Injection Attack
        Force the generation of useless signatures
                                                                  3
              Syntactic
              Worm Signature Generators
                                                                                Live Traffic

                                                                                  Network
                                                                                    tap
                               Signature generator
                                                                                 worm

   Network          Flow                    worm
                                         Suspicious                Signature    Firewall/
     tap          classifier                flows
                                          flow pool                generation     NIDS
Normal Traffic
      +
 worm flows
                                                                                            Stop
                            Traffic-based flow classifiers
           Simulated Honeynet
           Double Honeynet (“real” honeynet + sim. honeynet)

           Port-scanning detector
                                                                          Protected Network
           Anomaly IDS (e.g., byte frequency-based classifiers)




                                                                                               4
Traffic-based flow classifiers
           Honeynet
      Simulated Honeynet                     Double Honeynet
                                                                Layer-2
                                    C. Kreibich et al., Honeycomb: creating intrusion
                                             GW
                                    detection signatures using honeypots.
                                    Hot Topics in Networks, 2003.

                     Suspicious                      X
                                    V. Yegneswaran et al., An architecture for generati
                     Flow Pool                               Suspicious
                                    semantics-aware signatures.
 A                                     A Security, 2005. Flow Pool
                                    USENIX
                                                 Layer-1




     Port-scanning detection                     Anomaly IDS
                                  Y. Tang et al. Defending against internet worms:
                                  A signature-based approach.
                                  IEEE INFOCOM, 2005.

                                    H.A. Kim et al., Autograph: Toward automated,
                                    distributed worm signature detection.
                                    USENIX Security, 2004.
             Suspicious                             Suspicious
 A                                      A
             Flow Pool                               Flow Pool

                                                                               5
 Noise Injection Attack
      Noise Injection Attack “poisons” the suspicious flow
       pool dataset with fake anomalous flows
                                 Worm propagation

                                     Worm
               A                                                 B
                           Fake anomalous flows
                                     Internet
      Fake anomalous flows do not need to exploit
                  Signature generator
                  the vulnerability
                                      Worm
         tap          Flow          Suspicious      Signature      Useless
                                        +
                    classifier       flowpool
                                       F.A.F.
                                                    generation   signatures
     Live Traffic
                                                                  too many
Worm + F.A.F.                                                    FP and/or FN
                                                                                6
Injecting
Fake Anomalous Flows
     Simulated Honeynet
           Honeynet                   Double Honeynet
                                                         Layer-2
                                      GW




                     Suspicious              X
                     Flow Pool                          Suspicious
 A                                A        Layer-1
                                                        Flow Pool


     Port-scanning detection           Anomaly IDS




             Suspicious                    Suspicious
 A                                A
             Flow Pool                     Flow Pool

                                                                     7
        Case study: POLYGRAPH
       Signature generation for polymorphic worms
                                         Noisy data
                            Suspicious
                             flow pool
   Network       Flow                      Signature      Worm        Too many
     tap       classifier                  generation   Signatures   FP and/or FN
                            Innocuous
Worm + F.A.F                 flow pool

       The flow classification technique was not specified
       The authors assumed that the flow classifier is not
        perfect
            Noise could be stored into the suspicious pool
       J. Newsom et al.: “even in presence of noise
        Polygraph generates high-quality signatures”
             This is not true if the noise is deliberately
              well-crafted and injected by the attacker
                                                                               8
 Case study: POLYGRAPH
    Polygraph generates 3 different types of signatures:
           Conjunction, Token-subsequence, Bayes
   worm
variant A          1        2            3
                                                       Worm body
   worm
                   2            1            3         Protocol Framework
variant B
                                                        True Invariants
   worm
variant C           1       2       3



    Conjunction Signature               Token-subsequence Signature
{PF, TI-1, TI-2, TI-3}                   PF .* TI-1 .* TI-3 .*
                                                                            9
 Case study: POLYGRAPH
     Conjunction and Token-subsequence signatures are
      not resiliant to noise in the Suspicious flow pool
Suspicious
 flow pool


             Worm A
                                     without clustering

                          {
             Innocuous
             flow                   = Conjunction Signature
             Worm B
                               .*   = Token-subsequence Signature

             Worm C
                                        too many FP
                                    The signatures will be
                                        disregarded
                                                                10
 Hierarchical Clustering
                                                   Suspicious
Suspicious                                          Cluster 1
 flow pool                                                      Innocuous
                                                                flow

             Worm A

             Innocuous                             Suspicious
             flow
                         Clustering                 Cluster 2
             Worm B
                                                                   Worm A

             Worm C                                                Worm B


                                                                   Worm C
               Match new worm variants!

                                                    }
                , , = Conjunction Signature
                .* .* .*= Token-subsequence Signature
                                                                            11
        Misleading Conjunction and
        Token-Subsequence Signatures
       Objective: Caft the fake anomalou flows so that the
        Hierarchical Clustering cannot filter the noise
           the extracted signatures will produce False Negatives
                                                         Worm body
        Worm                                             Protocol Framework
                                                         True Invariants

         Fake                                            Permuted bytes
anomalous flow                                           Fake Invariants



      P(FI | innocuous flow) < P(TI | innocuous flow)
                               =
   P(false positive | sig(FI)) < P(false positive | sig(TI))
                                                                           12
Misleading
Hierarchical Clustering
                                                              W-A
Suspicious
 flow pool                                                    F-A

             W-A
                                                              W-B
             F-A
                                                              F-B
             W-B
                        Clustering
                                                              W-C
             F-B
                                                              F-C
             W-C




                                                              }
             F-C


                   ,      ,    ,
                                      = Conjunction                   Useless
                   .*     .*   .*    .* = Token-subsequence         Signatures!

The signatures do not contain True Invariants
          too many False Negatives!
                                                                            13
             Case study: POLYGRAH
            Bayes signatures: All the tokens common to at least K out of the
             total number of suspicious flows N are extracted
                    For each token tj
                        Psf = P(tj | Suspicious Flow)                            Worm

                        Pif = P(tj | Innocuous Flow)

                        lj = log(Psf / Pif)                         L = Sj l j
       {<PF, lPF>, <TI-1, lTI-1>, <TI-2, lTI-2>< TI-3, lTI-3>}

    worm
   variant                  1        2            3                         <
                                                               If L = Sj lj > q

Innocuous
                                                         2    The flow Innocuous!
                                                             The flow isis a worm!
     flow


                q is computed during training to obtain high DR and low FP
                                                                                     14
       Misleading Bayes Signatures
      Consider a “normal” HTTP string n = “Pragma: no-cache”
          Suppose P(n | Innocuous Flow) = 0.10
      Suppose the worm injects substrings of n into all the
       fake anomalous flows
   Bayes <GET,l1>; <HTTP/1.1,l2>; … ; <\xFF\xBF, lTI>;
        

signature <Pragma: no, ln1>; <ragma: no-,ln2>; … <: no-cache,ln7>;
          lni @ ln = log(0.5/0.1)
      Score multiplier effect for innocuous flows which
       contain n   Innocuous flow

          GET .* HTTP/1.1 .* Pragma: no-cache .*

          L = Sk l k + Si l n i      L >> Sk lk
                                                                15
            Misleading Bayes Signatures
     Normal HTTP string n = “Pragma: no-cache”
     The worm inject substrings of n into all the F.A.F
   Bayes   <GET,l1>; <HTTP/1.1,l2>; … ; <\xFF\xBF, lTI>;
signature   <Pragma: no, ln1>; <ragma: no-,ln2>; … <: no-cache,ln7>;
     Score multiplier effect
           GET .* HTTP/1.1 .* Pragma: no-cache .* Innocuous flow
           L = Sk l k + Si l n i L >> Sk lk

                     Objective of the attack:
      Inject “normal” substrings into the fake anomalous flows
        so that POLYGRAPH cannot find a “good” threshold q

       too many FALSE POSITIVES or FALSE NEGATIVES!
                                                                       16
         Crafting the Noise

                                                               Worm body
    Worm                                                       Protocol framework
                                                               True Invariant

                                                               Permuted bytes
     Fake                                                      Fake Invariant
anom. flow                                                     Score multip. strings



        The Fake Invariants are specific for each worm and its fake
         anomalous flows

        The score multiplier strings have to be common to all the fake
         anomalous flows

                                                                            17
 Experimental results
   Experimental Setup
     We implemented POLYGRAPH according to the description in [1]
     “Apache-Knacker” HTTP Worm:
 worm GET .* HTTP/1.1\r\n.*\r\nHost: .*\r\n .*\r\nHost:
      .*\xFF\xBF.*\r\n
   Training dataset
        Suspicious flow pool = 10 worm variants
        Innocuous flow pool = 100,459 HTTP requests (0.007% FP)
   Test dataset
        “Suspicious” Test flow pool = 100 worm variants
        “Normal” Test flow pool = 217,164 HTTP requests (0.0% FP)
   Attacker’s dataset
        300 Candidate Score Multiplier Strings extracted from 5,000 flows
[1] J. Newsome, B. Karp, and D. Song.
    Polygraph: Automatically generating signatures for polymorphic worms.
    In Proceedings of the IEEE Symposium on Security and Privacy, May 2005.
                                                                        18
Experimental results with
Bayes signatures
 Score Multip. Srings (m=4): “Pragma: no-cache”, “-powerpoint”

                                                 no noise
                                                 1 fake flow
                                                 2 fake flows




                                                                 19
 Experimental results with
 Bayes signatures
    Score Multip. Srings (m=4): “Pragma: no-cache”, “-powerpoint”
                                                 no noise
                                                 1 fake flow
                                                 2 fake flows




Under attack it is impossible to find a threshold
 that produces low FP and FN rates at the same time

                                                                    20
Experimental results with
all the 3 types of signatures
   20 rounds - The attack is considered successful if
       Conjunction and Token-subsequence produce 100% False
        Negatives
       Bayes produces more than 1% of False Positives

                      1 F.A.F./worm    2 F.A.F./worm

    Conjunction            65%             95%
    Token-
    subsequence
                           40%             90%

    Bayes                  90%             100%
    All the 3
    signatures
                           20%             85%
                                                           21
        Conclusion
   Noise Injection Attack has a high chance
    to mislead syntactic worm signature
    generators
       Forces extraction of useless signatures!

   Need a precise flow classifier that can
    effectively filter the noise
       Open problem… (to be solved!)

                                                   22
Thank you!




             23
          Misleading Conjunction and
          Token-Subsequence Signatures
                                          W1       F1           W2           F2           W3           F3
   Wi = i-th Worm variant                 PF + FI-1
   Fi = i-th Fake anomalous flow                PF + TI
                                                           PF
   P(FP | FI) < P(FP | TI)                                 PF + TI
                                                                        PF
   PF = Protocol Framework                                PF
   TI = True Invariant                                         PF
                                                                       PF
   FI = Fake Invariant
                                                                             PF
                                                                     PF + FI-2
                                                                           PF + TI
                         Worm
                                                                                     PF
                         Fake anomalous                                              PF
                         flow                                                             PF
                                                                                           PF + FI-3



                                                                                                            24
          Misleading Conjunction and
          Token-Subsequence Signatures
                                          [ W1 , F1 ]         W2              F2         W3            F3
   Wi = i-th Worm variant                              PF
   Fi = i-th Fake anomalous flow                            PF
                                                                   PF
   P(FP | FI) < P(FP | TI)                                              PF
                                                                  PF + FI-2
   PF = Protocol Framework                                              PF + TI
   TI = True Invariant                                                            PF
   FI = Fake Invariant                                                            PF
                                                                                          PF
                                                                                               PF + FI-3

                         Worm

                         Fake anomalous          [ W1 , F1 ]        [ W2 , F2 ]         [ W3 , F3 ]
                         flow




               Min num of flows = 3                      NO SIGNATURE!
                                                                                                            25
           The results are not
           Deterministically Predictable
                                           W1        F1         W2        F2          W3           F3
    Wi = i-th Worm variant                  PF + FI-1
    Fi = i-th Fake anomalous flow                         C
                                                 PF + TI + C
                                                           PF
     P(FP A >= P(FP |signature
    P(FP | |FI) usefulTI)TI,C)
             FI) < P(FP |                 could be produced, by chance
                                                     PF + TI
                                                                     PF
    PF = Protocol Framework                              PF
        However, our experiments showed that the
     TI = True Invariant            PF
                                       PF
Noise Injection Attach has a high probability of success
    FI = Fake Invariant
                                          PF
    C = Common Token by chance!
                                                                 PF + FI-2
                                                                       PF + TI
                         Worm
                                                                                 PF
                         Fake anomalous                                          PF
                         flow                                                         PF
                                     [ W1 , W2 ] F1 F2 W3 F3
                                                                                       PF + FI-3



Good signature!                       [ W1, W2, W3 ] F1 F2 F3
                                                                                                        26
Misleading Bayes Signatures
   Consider a string n of length n that is present in the
    Innocuous pool with probability
       0.05 < P(n | Innocuous Flow) < 0.20
   If all the fake anomalous flows contain n, the string
    will be present in 50% of the suspicious flows
   Thus, the extracted signature will contain n and the
    related score ln will be
       log(0.5/0.20) < ln < log(0.5/0.05)
   This means that an innocuous flow containing n will
    receive a total score L >= ln

                                                        27
    “Score multiplier” effect
   Spliting n into all the substrings of length m<n
        E.g., “Pragma: no”, “ragma: no-”, “agma: no-c”, etc.
        If !(m<<n) then p(ni,i+m | IF) ~= p(n | IF), i
   Injecting all the substring ni,i+m of n into the fake
    anomalous flows, they will be all considered as tokens in
    the signature (50% occurrence freq.)
        lni,i+m~= ln
   An innocuous flow that contains n contains also all the
    ni,i+m tokens
   A score multiplier effect is obtained for the innocuous
    flows which contain n
        GET .* HTTP/1.1 .* Pragma: no-cache .*
        L = Sk lk + Si lni >> ln   If L > q        False positive
                                                                 28
Combining our attack with the
Red Herring
   Increases the probability of success
       The worm includes some temporay invariants
       This invariants expire over time
       This means that even if POLYGRAPH generates useful
        Conjunction and Token-subsequence signatures, after
        a while they will become useless
       The second time POLYGRAPH generates the signature,
        it could be not as “fortunate” as the first time
       Further, if the temporary invariants are chosen among
        “high frequency normal tokens”, the combination of
        the attacks will not interphere with the attack to
        Bayes Signatures

                                                          29
Experimental results with
Bayes Signatures




                            30
Experimental results with
all the 3 types of signatures
   The results are not deterministically predictable (due
    to tokens that are common just by chance)
       Simulations: 2 groups of tests
            For each group of tests we simulated 2 scenarios:
                1 faf/worm and 2 faf/worm

            1° group of tests: 45 rounds
            2° group of tests: 20 rounds




                                                                 31
Possible countermeasures
   White list
   “Coloring” technique




                           32