Slide 1 - The Team for Research in Ubiquitous Secure Technology

Document Sample
Slide 1 - The Team for Research in Ubiquitous Secure Technology Powered By Docstoc
					DETECTION OF ATTACKS ON
  COGNITIVE CHANNELS
                 Annarita Giani

    Institute for Security Technology Studies
          Thayer School of Engineering
                Dartmouth College
                   Hanover, NH



                Berkeley, CA
               October 12, 2006
                   Outline

1. Motivation and Terminology
2. Process Query System (PQS) Approach
3. Implementation of a PQS detecting
  a. Phishing
  b. Data Exfiltration
  c. Covert Channel
4. Flow Attribution and Aggregation
5. Conclusion and Acknowledgments
                   Outline

1. Motivation and Terminology
2. Process Query System (PQS) Approach
3. Implementation of a PQS detecting
  a. Phishing
  b. Data Exfiltration
  c. Covert Channel
4. Flow Attribution and Aggregation
5. Conclusion and Acknowledgments
                                Malware and Detection
                                 70s. System Admins directly monitor
                                 user activities                                                        1990s - First
                                                                                                        Commercial Antivirus
                                               Late 70 - early 80s. System
                                               Admins review audit logs for                                                    1991 – Norton Antivirus
                                                                                        90s. Real time IDS.
                                               evidence of unusual behavior.                                                   released by Symantec

                                                      Programs analyze audit log,                                                Phishing
                                                      usually at night.                                                          Attacks
                                                                                                                                       Misinformation
                                          Covert Channel                                                                                    Multi Stage
                                                                                                                                             Attacks
                                                                                                                                   Web
   Grace Hopper.      Penrose: Self-
                                                                                              Melissa virus,                   Defacements
   MIT - First        reproducing              Computer viruses             Morris
   Computer Bug                                on ARPANET                   worm              damage = $80 M
                      machines
         1945             1959                     1970                     1988                    1999
                   1951                                                                   1990s                  2001            now
  1940                                 ~1960                1982
Von Neumann     Von Neumann      Stahl reproduces                                  Malicious                  Code Red              Covert
                demonstrated                               First virus in
studied self                     Penrose idea in                                   programs exploit           worm, damage          Channel
                how to create                              the wild
reproducing                      machine code on                                   vulnerabilities in         = $2 B
                self-                                                                                                               Exfiltration of
mathematical                     an IBM 650                                        applications and
                reproducing                                                                                                         information
automata                                                                           operating
                automata                                                           systems


                                                                               WORK
               THEORETICAL FOCUS OF MOST SECURITYStudy, ESD-TR-73-51, ESD/AFSC, Bedford, MA
                   1972: J.P. Anderson, Computer Security Technology Planning
                                                                                            OUR FOCUS
                   1984:             An Intrusion Detection Model, IEEE Transaction on
               WORK D. Denning,Haystack Project, Lawrence Livermore Laboratories Software Engineering, VolSE-13(2)
                   1988: M. Crosby,
                    1989: from Detection Stalker, a Commercial Product First HIDS
                Intrusion the HaystackAProject.System (IDS) are mainly based onFirst NIDS
                    1990: L. Heberlein et al, Network Security Monitor, Symposium on Research Security and Privacy
                    1994: from matching and anomaly detection.
                signatureASIM (Air Force) Netranger First Commercial NIDS.
                                                                                                                                                          4
                 Cognitive Channels
   A cognitive channel is a communication channel between the
   user and the technology being used. It conveys what the user
   sees, reads, hears, types, etc.


                                             Cognitive
                      Network                 Channel
                      Channel
           SERVER               CLIENT                   USER




           Focus of the current protection
           and detection approaches

The cognitive channel is the weakest link in the whole framework.
Little investigation has been done on detecting attacks on this channel.
                  Cognitive Attacks
 Our definition is from an engineering point of view.

Cognitive attacks are computer attacks over a cognitive channel.
They exploit the attention of the user to manipulate her
perception of reality and/or gain advantages.


COGNITIVE HACKING. The user’s attention is focused on the channel. The
attacker exploits this fact and uses malicious information to mislead her.

COVERT CHANNELS. The user is unaware of the channel. The attacker uses
a medium not perceived as a communication channel to transfer information.

PHISHING. The user's attention is attracted by the exploit. The information is
used to lure the victim into using a new channel and then to create a false
perception of reality with the goal of exploiting the user’s behavior.
                                                                            6
                     Cognitive Hacking
The user's attention is focused on the channel. The attacker exploits
this fact and uses malicious information in the channel to mislead her.


                                                         Misleading information
                                                            from a web site
               Attacker: Makes a fake web site

                                 1
                                                                     2



Attacker: Obtains advantages
from user actions                                               3
                      4                   Victim: Acts on the
                                          information from the web
                                          site
                                                                             7
                        Covert Channels
 The user's attention is unaware of the channel. The attacker uses a
 medium not perceived as a communication channel to transfer
 information.




  Attacker: Codes data into                  User: does not see inter-
  inter-packet delays, taking                packet delay as a
1 care to avoid drawing the                  communication channel and
  attention of the user.                     does not notice any
                                             communication.


                                data

                                  2

                                                                   8
                                 Phishing
The user's attention is attracted by the exploit. The information is
used to lure the victim into using a new channel and then to create a
false perception of reality with the goal of exploiting the user’s
behavior.
                                                                   Misleading email to
                                                                    get user attention
             Send a fake email
                                                                   Visit
                                                                   http://www.cit1zensbank.com
                                              1

                                                                                          2



                    4                                              3
                                                  First name,
           First name,                            Last name
           Last name                              Account Number
           Account #                              SSN
                             Bogus web site
           SSN

                                                                                         9
Why current IDS cannot be applied to
   attacks on cognitive channels

• Sophistication of attack approaches.
• Increasing frequency and changing nature of attacks.
• Inherent limits of network-based IDS.
• Inability to identify attackers’ goals.
• Inability to identify new attack strategies.
• No guidance for response.
• Often simplistic analysis.

                                                         10
                   Outline

1. Motivation and Terminology
2. PQS Approach
3. Implementation of a PQS detecting
  a. Phishing
  b. Data Exfiltration
  c. Covert Channel
4. Flow Attribution and Aggregation
5. Conclusion and Acknowledgments
         Process Query System
          Observable events coming from sensors



Models                                            Hypothesis




                         PQS
                        ENGINE




                           Tracking
                          Algorithms

                                                               12
Framework for Process Detection
                                                                                                                                                                Sample
                  An Environment                                                                                                                                Console
                                                                             Indictors and Warnings
FORWARD PROBLEM
                                                               6     129.170.46.3 is at high risk
                                                                     129.170.46.33 is a stepping stone




                                                                                                     INVERSE PROBLEM
                                                                     ......
                                                              that
                                                               are                            that detect
                                                              used                            complex attacks
            1              consists of                                        5               and anticipate
                                                               for
                                                             control Hypotheses               the next steps
                    Multiple Processes
                                                                                                                       Track 1
                                                                                                                                       Track 1
                    l1 = router failure                                                                                Track 2
                                                                                                                                       Track 2
                                                                                                                       Track 3                         1
                                    l2 = worm                                                                                          Track 3
      l3 = scan
                                                                                                                                          0.8




                                                                                                                                        Track Score
                                                                     Hypothesis 1                                                                     0.6
                                                                                      Hypothesis 2                                                    0.4

                                                                                                                                                      0.2

            2              that produce                       that          4         that PQS resolves into                                           0
                                                                                                                                                            0     100        200
                                                               are
                                                                                                                                                                 Service Degradation
                           Events                             seen              Unlabelled Sensor Reports
                                                       …….     as                                                                …….                            Track
                                                                                                                                                                Scores
                                                Time                                                                   Time
                                                               3

             Real World                                                Process Detection (PQS)
                                                                                                                                                                13
Hierarchical PQS Architecture
                              TIER 1                                   TIER 2

  TIER 1          TIER 1            TIER 1       TIER 2       TIER 2        TIER 2
  Models        Observations      Hypothesis   Observations   Models      Hypothesis

 Scanning                         PQS            Events
                                                               More Complex
                    Snort                                         Models
                  IP Tables


  Infection                       PQS            Events

                   Snort
                  Tripwire
                                                                                PQS


 Data Access                      PQS            Events

                   Samba
                                                                         RESULTS


 Exfiltration                     PQS            Events
              Flow and Covert
               Channel Sensor
                                                                                       14
Hidden Discrete Event System Models
 Dynamical systems with discrete state spaces that are:

 Causal - next state depends only on the past
 Hidden – states are not directly observed
 Observable - observations conditioned on hidden state are
 independent of previous states

 Example. Hidden Markov Model
               N States
               M Observation symbols
               State transition Probability Matrix, A
               Observation Symbols Distribution, B
               Initial State Distribution p
 HDESM models are general
                                                             15
HDESM Process Detection Problem

 Identifying and tracking several (casual discrete state) stochastic
 processes (HDESM’s) that are only partially observable.



            TWO MAIN CLASSES OF PROBLEMS

Hidden State Estimation: Determine the “best” hidden states
sequence of a particular process that accounts for a given
sequence of observations.

Discrete Sources Separation: :Determine the “most likely”
process-to-observation association


                                                                16
Discrete Source Separation Problem
     HDESM Example (HMM):
                             3 states + transition probabilities
                             n observable events: a,b,c,d,e,…
                             Pr( state | observable event ) given/known


                  Observed event sequence:
                  ….abcbbbaaaababbabcccbdddbebdbabcbabe….




Catalog of
Processes

Which combination of which process models “best” accounts for the observations?
Events not associated with a known process are “ANOMALIES”.


                                                                            17
                 An analogy....

 What does
                     hbeolnjouolor
 mean?

Events are:    hbeolnjouolor
Models = French + English words (+ grammars!)

                hbeolnjoulor = hello + bonjour

Intermediate hypotheses include tracks:   ho + be
                                                    18
              PQS applications
        •   Vehicle tracking
        •   Worm propagation detection
        •   Plume detection
        •   Dynamic Social Network Analysis
        •   Cyber Situational Awareness
        •   Fish Tracking
        •   Autonomic Computing
        •   Border and Perimeter Monitoring
        •   First Responder Sensor Network
        •   Protein Folding


TRAFEN (TRacking and Fusion ENgine):
           Software implementation of a PQS
                                              19
        Example – vehicle tracking
               (Valentino Crespi, Diego Hernando)
T        T+1      T+2




    Continuous Kinematic Model
    Linear Model with Gaussian noise
                                                    20
            Multiple Hypothesis Tracking
D. Reid. An algorithm for Tracking Multiple Targets – IEEE Transaction on Automatic
Control,1979

                           Use Kalman Filter          T        T+1          T+2




                                                             Hypotheses    Predictions
Track = process instance
Hypothesis = consistent tracks
Given a set of “hypotheses” for an event stream of length k-1, update the
hypotheses to length k to explain the new event (based on model description).
                                                                                  21
                        Model vehicle Kinematics
States: x(t k 1 ) = (t k )  x(t k )  w(t k )
              x(t k ) State of target at time t k
             (t k ) Prediction Matrix
                 Precision Matrix
             w(t k ) Sequence of normal r.v. with Zero mean and covariance: Q( t k )



                           Model Measurement
 Observe State of target through a noisy measurement: z(t k ) = Hx (t k )   (t k )
             z( t k )   Measure (observation)
                        “Observable” Matrix: extracts observable          z(t k ) = Hx (t k )   (t k )
              H         information from state.
              (t k )   Sequence of normal r.v. with Zero mean and
                        covariance R


                                State Estimation
Kalman filters are used for predictions.
                                                                                             22
                                   Kalman Filters                                                Correct the
                                                                                                 estimation given
                                                                                                 the new obs
                                                Estimation given obs before tk
      Prediction
                     x (t k ), P (t k )                                                         ˆ ˆ
                                                                                                x, P
                                                             x (t k ), P (t k )                                z (t k )
Noisy z ( t k )            KF
observation
                       ˆ         ˆ
                       x (t k ), P (t k )
                                               Error
                                                                                                  Estimation
                                               Covariance
                                               Estimation


     output
                       x (t k 1 ), P (t k )    Error
                                                Covariance                                                      Prediction
                                                Prediction
                                                                                                   x (t k 1 ), P (t k 1 )
       z (t k 1 )         KF
      Estimate       ˆ            ˆ
                     x (t k 1 ), P (t k 1 )
      state                                                                       z (t k 1 )

                                                                                                                       23
                                         Kalman Equations
System’s state:                        x(t k ) | z k 1 ~ N ( x(t k ), P (t k ))     (Normal Multivariate)
                                                                      (output)
      x( t k ) = x (t k )  K (t k )  [ z(t k )  Hx ( t k )]
      ˆ
                                                                                    Estimation
      ˆ
      P (t k ) = P (t k )  P (t k ) H T ( HP (t k ) H T  R) 1 HP (t k )
           ˆ
       K = PH T R1

 K is the Kalman Gain: minimizes                                                             x( t k ) | z k
 updated error covariance matrix
                                                                                              = x( t k )
                                                                                                 ˆ
 (mean-square error)
                                                                                                   ˆ
                                                                                              2 = P (t k )
                   E[( xk  xk )( xk  xk )T ]
                            ˆ          ˆ

 x ( t k 1 ) = ( t k )  x( t k )
                            ˆ
                        ˆ
 P (t k 1 ) = (t k ) P (t k )(t k )T  Q(t k )T                            New Prediction
                                                                                                              24
        Real time Fish Tracking
                      (Alex Jordan )

• Track the fish in the fish tank
• Very strong example of the power of PQS
   – Fish swim very quickly and erratically
   – Lots of missed observations
   – Lots of noise
   – Classical Kalman filters don’t work (non-linear
     movement and acceleration)
   – “Easier” than getting permission to track people (we
     mistakenly thought)

                                                        25
                Fish Tracking Details
• 5 Gallon tank with 2 red Platys named
  Bubble and Squeak
• Camera generates a stream of
  “centroids”:
    For each frame a series of (X,Y) pairs
      is generated.
• Model describes the kinematics of a
  fish:                                                 Infrared Camera
    The model evaluates if new (X,Y)
      pairs could belong to the same fish,
      based on measured position,
      momentum, and predicted next
      position. This way, multiple “tracks”
      are formed. One for each object.
• Model was built in under 3 days!!!          Detect and differentiate
                                              people by behavior not
                                                                       26
                                                   appearance
Cybenko
            Autonomic Server Monitoring
                               (Chris Roblee)

  • Objective: Detect and predict deteriorating service situations
  • Hundreds of servers and services
  • Various non-intrusive sensors check for:
     – CPU load
     – Memory footprint
     – Process table (forking behavior)
     – Disk I/O
     – Network I/O
     – Service query response times
     – Suspicious network activities (i.e.. Snort)
  • Models describe the kinematics of failures and attacks:
     The model evaluates load balancing problems, memory leaks,
       suspicious forking behavior (like /bin/sh), service hiccups correlated
       with network attacks…
                                                                         27
Cybenko
                                  Server Compromise Model:
                Integration of host CPU load sensors and IDS sensor allows
                  detection of attacks not possible with different sensors

  2.   Monitored host sensor output (system level)                                              3.    PQS Tracker Output
          Current system record for host 10.0.0.24 (10 records):                                          Last Modified:   Mon Nov 21 21:01:03
          Average memory over previous 10 samples: 251.000                                                Model Name:       server_compromise1
          Average CPU over previous 10 samples: 0.970                                                     Likelihood:       0.9182
          | time             | mem used | CPU load | num procs | flag |                                   Target:          10.0.0.24
          ----------------------------------------------------------------------------------              Optimal Response: SIGKILL proc 6992
          | 1101094903 |            251       |     0.970        |      64         |        |
          | 1101094911 |           252       |     0.820         |      64        |         |
          | 1101094920 |            251       |     0.920        |      64         |        |
          | 1101094928 |            251       |     0.930        |      64         |        |
          | 1101094937 |            251       |     0.870        |      65         |        |
          | 1101094946 |            251       |     0.970        |      65         |        |
          | 1101094955 |            251       |     0.820        |      65         |        |
          | 1101094964 |            253       |     1.220        |      65         | ! |
          | 1101094973 |            255       |     1.810        |      65         | ! |
          | 1101094982 |            258       |     2.470        |      65         | ! |
                                                                                                               o1 o2 o3
  1.   Snort NIDS sensor output
                                     .
                                     .
                                     .
          Nov 21 20:57:16 [10.0.0.6] snort: [1:613:7]
          SCAN myscan [Classification: attempted-recon] [Priority: 2]:
          {TCP} 212.175.64.248-> 10.0.0.24
                                     .
                                     .                                                               o1                            SIGKILL
                                     .
                                                                                                     t0        t1 t2 t3             t4
                                                                                                          Observations                 28
                                                                                                                                 Response
Cybenko
                                          Airborne Plume Tracking
                                                                                             (Glenn Nofsinger)
                                                                                                                  Airborne
                                                                                                                  agent
                                                                                                                  sensor on
                                                                                                                  DC Mall
Forward Problem - drift and diffusion
  182
                                                                                                   159.4
                                                                                                           Inverse Problem - locate sources and
  170

  160
                                                                                                   111.0
                                                                                                   98.7
                                                                                                           types of releases
  150
                                                                                                   74.0
  140                                                                                              61.7
                                                                                                   49.3
  130                                                                                              37.0
                                                                                                   24.7
  120                                                                                              12.3
                                                                                                   0.0
  110

  100

  90

  80

  70

  60

  50

  40

  30

  20

  10

   0
        0   10   20   30   40   50   60   70   80   90   100 110 120 130 140   150 160 170   182




                                                                                                                                            29
Dynamic Social Network Analysis
                          (Wayne Chung)

                                     A                   A

                                                                                      A
                                                     B                 B
                                                                                                     B
                                     A asks B to         B accepts                 A adds B to
                                    join a project                                 a list of
                                                                                   recipients
                                                                                   AB, C, …




                                                                                   join
                                                              question/
                                          invite              accept
                                                                                          not join
      “Static” Analysis                                      “Dynamic” Analysis

 Detect "business" and "social"                Large
                                                              New member active
                                               group
 processes, not static artifacts.              joining
                                                              introducing others


 Sensors...communication events
 Models...social processes


                                                                                                         30
              PQS in Computer Security
          (Alex Barsamian, Vincent Berk, Ian De Souza, Annarita Giani)
                                                              5
                                                                       1
                                                          2
                                                                   8
                                                      7
        Internet                                                  12




                                  DIB:s

                                  BGP

                                  IPTables

                                  Snort
          BRIDGE
                          DMZ
                   WWW   Mail                          PQS
                                   observations
WS                                                    ENGINE



                                  Tripwire

WinXP      LINUX                  SaMBa


                                                                           31
               Sensors and Models
1   DIB:s             Dartmouth ICMP-T3 Bcc: System
2   Snort, Dragon     Signature Matching IDS
3   IPtables          Linux Netfilter firewall, log based
4   Samba             SMB server - file access reporting
5   Flow sensor       Network analysis

6   ClamAV             Virus scanner

7   Tripwire           Host filesystem integrity checker

1           Noisy Internet Worm Propagation – fast scanning
2           Email Virus Propagation – hosts aggressively send emails

3           Low&Slow Stealthy Scans – of our entire network

4           Unauthorized Insider Document Access – insider information theft

5           Multistage Attack – several penetrations, inside our network

6           DATA movement

7           TIER 2 models
                                                                               32
                   Outline

1. Motivation and Terminology
2. PQS Approach
3. Implementation of a PQS detecting
  a. Phishing
  b. Data Exfiltration
  c. Covert Channel
4. Flow Attribution and Aggregation
5. Conclusion and Acknowledgments
                           Phishing Attack
The act of sending an e-mail to a user falsely claiming to be an
established legitimate enterprise in an attempt to scam the user into
surrendering private information.
The e-mail directs the user to visit a web site where they are asked
to update personal information.




      Visit
      http://www.cit1zensbank.com
                        First name,                           First name,
        1               Last name                             Last name
                        Account Number                        Account number
                        SSN                                   SSN
                    2                    Bogus web site
                                                          3
                                                                         34
                                  Complex Phishing Attack Steps
 Stepping                                                                                                                Web page,
                                                … as usual browses the web and …
 stone                                                                                                                   Madame X
                                                                       …. visits a web page.
                                                            1    inserts username and password.
                                                                (the same used to access his machine)
    accesses user machine using




                                  100.20.3.127
                                                                                                            2   165.17.8.126
                                                            5
    username and password




                                                                                          records username
                                        uploads some code




                                                                                          and password




                              3                     4
                                                                                                                    Victim


                                                                        downloads some data
Attacker
                                                                                                        6
                      51.251.22.183                                                                                100.10.20.9
                                                                                                                                 35
  Complex Phishing Attack Observables
  Stepping                                                                                                                                         Web Server used- Madame X
                                                                                                                             Sept 29 11:17:09
  stone                                                                                                                                                     Attacker
                                                                                                                      1. RECON
                                                                                                                                                    SOURCE
                                                                                                              DEST    SNORT: KICKASS_PORN
                                                                                                                      DRAGON: PORN HARDCORE
                                         100.20.3.127


                            DEST                                        DEST                                   DEST
                                                                                                                                               Username
                                                                                                                                                                165.17.8.126
                                                NON-STANDARD-PROTOCOL




                                                                                                                                               password
   Sept 29 11:23:56

                       SSH (Policy Violation)




                                                                                          Sept 29 11:23:56
                       2. ATTEMPT SNORT



                                                                         3. DATA UPLOAD
                                                                          FLOW SENSOR




                      SOURCE
                                                                               SOURCE
                                                                                                                                                                       Victim
                                                                                                                                                             SOURCE



Attacker                                                                                                     DEST
                                                                                                                         5. DATA DOWNLOAD                 SOURCE
                                                                                                                            FLOW SENSOR
                            51.251.22.183                                                                                   Sept 29 11:24:07                          100.10.20.9
                                                                                                                                                                                    36
                            Flow Sensor
• Based on the libpcap interface for packet capturing.
• Packets with the same source IP, destination IP, source port, destination
       port, protocol are aggregated into the same flow.


   • Timestamp of the last packet
   • # packets from Source to Destination
   • # packets from Destination to Source
   • # bytes from Source to Destination
   • # bytes from Destination to Source
   • Array containing delays in microseconds between packets in the flow


  We did not use Netflow only because it does not have all the fields that we need.
Two Models Based on the Flow Sensor
                         Low and Slow UPLOAD
        Volume           Packets          Duration        Balance   Percentage
  Tiny: 1-128b        4:10-99         4: 1000-10000 s      Out         >80
  Small: 128b-1Kb     5: 100-999      5: 10000-100000 s
                      6: > 1000       6: > 100000 s



                                   UPLOAD
        Volume           Packets          Duration        Balance   Percentage

  Tiny: 1-128b        1: one packet   0: < 1 s             Out         >80
  Small: 128b-1Kb     2: two pckts    1: 1-10 s
  Medium: 1Kb-100Kb   3: 3-9          2: 10-100 s
                      4: 10-99        3: 100-1000 s
  Large: > 100Kb      5: 100-999      4: 1000-10000 s
                      6: > 1000       5: 10000-100000 s
                                      6: > 100000 s
  Phishing Attack Model 1 – very specific
                         ATTEMPT                UPLOAD



                                    UPLOAD
                             2                     4

                                                                              DOWNLOAD
                                                            ATTEMPT
                ATTEMPT               ATTEMPT

RECON
           1                                                          6    DOWNLOAD
                                                                                      7
                                                   UPLOAD


                UPLOAD
        RECON                                                    ATTEMPT



                             3     ATTEMPT
                                                   5


                          UPLOAD                ATTEMPT


                                                                                      39
 Phishing Attack Model 2 – less specific
                             ATTEMPT dst,src           UPLOAD dst,src




                                         UPLOAD
                                  2       dst,src          4
                                                                                                 DOWNLOAD src
RECON or
ATTEMPT             RECON or ATTEMPT                                  ATTEMPT
or COMPROMISE       or COMPROMISE dst                                  dst, ! src
                                               ATTEMPT
                                                dst, src

                1                                          UPLOAD                   6                   7
                                                           dst, src                          DOWNLOAD
                                                                                                src

                     UPLOAD dst
                                                                 ATTEMPT
                                                                  dst, !src
   RECON or
   ATTEMPT                                                                          ATTEMPT
   or COMPROMISE                  3     ATTEMPT            5                         dst, !src
                                          dst,A




                           UPLOAD dst, src             ATTEMPT dst,src
                                                                                                        40
    Phishing Attack Model 3 – more general
                                                    UPLOAD dst,src
                RECON or ATTEMPT
                or COMPROMISE dst, src




                                         UPLOAD
                                          dst,src                                               DOWNLOAD src
                                 2                         4
                                                                      RECON or ATTEMPT
RECON or
                RECON or ATTEMPT                                      or COMP dst, ! src
ATTEMPT                                    RECON or ATTEMPT
or COMPROMISE   or COMPROMISE dst          or COMP dst, src


                1                                          UPLOAD                6                    7
                                                           dst, src                        DOWNLOAD
                                                                                              src

                    UPLOAD dst                                  RECON or ATTEMPT
                                                                or COMP dst, !src
  RECON or
  ATTEMPT                                                                       RECON or ATTEMPT
  or COMPROMISE                  3                         5                    or COMP dst,! src
                                         RECON or
                                         ATTEMPT
                                         or COMP dst


                                                                                                      41
                         UPLOAD dst, src               RECON or ATTEMPT
                                                       or COMP dst, src
Phishing Attack Model 3 – Most general

                              ATTEMPT or         ATTEMPT          DOWNLOAD
                              UPLOAD




                 ATTEMPT or
                 UPLOAD
 RECON
            1                     2                 3                 4
                                       ATTEMPT
                                                           DOWNLOAD




         RECON




Stricter models reduce false positives, but less strict
   models can detect unknown attack sequences
                                                                             42
   Air Force Rome Lab Blind Test
                           December 12-14, 2005

The collected data is an anonymized stream of network traffic, collected
using tcpdump. It resulted in hundreds of gigabytes of raw network traffic.


    • Valuable feedback on performance and design
    • Strengths:
       – Number of sensors integrated
       – Number of models
       – Easy of sensor integration
       – Ease of model building
    • Drawback:
       – System is real-time (results time-out)

                                                                         43
  Complex Phishing Attack Results
No observations coming from Dragon
sensor and Flow sensor
Attack steps              0 of 5
Background attackers      9 of 15
Background scanners       25 of 55
Stepping stones           0 of 1

                             Using Dragon and Flow observations
                             Attack steps                     5 of 5
                             Background attackers             10 of 15
                             Background scanners              23 of 55
                             Stepping stones                  1 of 1
                             False alarms                     1
                                                                       44
                                                        Summary of Results
                      Precision                                                             Fragmentation

                      100                                                                   100

                       90                                                                   90
GOAL: > AVERAGE




                       80                                                                   80




                                                                                                                                                                      GOAL: < AVERAGE
                       70                                                                   70
                       60                                                                   60
                       50                                                                   50
                       40                                                                   40
                       30                                                                   30
                       20                                                                   20
                       10                                                                   10
                        0
                                                                                             0
                            4s1   4s3   4s4   4s13   4s14   4s5   4s6   4s8   4s16   4s17         4s1    4s3   4s4   4s13   4s14   4s5     4s6   4s8    4s16   4s17




                       Mis-Associations
                      100
                      90
                      80                                                                                Scenario 4s14: Phishing attack
    GOAL: < AVERAGE




                      70
                      60
                      50
                      40
                      30
                      20
                      10
                       0
                                                                                                  Threshold Values:         0.0          0.5     0.75
                            4s1   4s3   4s4   4s13   4s14   4s5   4s6   4s8   4s16   4s17
                                                                                                                                                               45
                   Outline

1. Motivation and Terminology
2. PQS Approach
3. Implementation of a PQS detecting
  a. Phishing
  b. Data Exfiltration
  c. Covert Channel
4. Flow Attribution and Aggregation
5. Conclusion and Acknowledgments
                       Data Exfiltration

                CNN.COM
 The Problem:   Sunday, June 19, 2005 Posted: 0238 GMT (1038 HKT)

                NEW YORK (AP) -- The names, banks and account numbers
                of up to 40 million credit card holders may have been
                accessed by an unauthorized user, MasterCard
                International Inc. said.




PQS Approach:   Tier 1 models monitor outbound data. They are based on flow
                analysis.

                Tier 2 models correlate outbound data within a context to infer if
                it is a normal systems and user behavior or ongoing attacks


                                                                               47
                       Basic Ideas: An Example

                                       nfs2.pqsnet.net                            Exfiltration modes:
        600000
                                                                   IN
                                                                   OUT            • SSH
                                                                                  • HTTP
        500000
                                                                                  • FTP
                                                                                  • Email
        400000                                                                    • Covert channel
                                                                                  • Phishing
                                                                                  • Spyware
bytes




        300000                                                                    • Pharming
                                                                                  • Writing to media
                                                   Scanning
                                                                                        • paper
        200000
                                                   Infection
                                                                                        • drives
                                Normal activity    Data Access
                                                                                  • etc
        100000




                       50       100     150       200    250       300    350
                  Low Likelihood of x 15 sec
                                     Time                       High Likelihood of
                 Malicious Exfiltration                        Malicious Exfiltration
                            Increased outbound data                                               48
Hierarchical PQS Architecture
                              TIER 1                                   TIER 2

  TIER 1          TIER 1            TIER 1       TIER 2       TIER 2        TIER 2
  Models        Observations      Hypothesis   Observations   Models      Hypothesis

 Scanning                         PQS            Events
                                                               More Complex
                    Snort                                         Models
                  IP Tables


  Infection                       PQS            Events

                   Snort
                  Tripwire
                                                                                PQS


 Data Access                      PQS            Events

                   Samba
                                                                         RESULTS


 Exfiltration                     PQS            Events
              Flow and Covert
               Channel Sensor
                                                                                       49
Example PQS model: Macro in word document
for exfiltration
                                               Balanced Flow




                                           2



      TIER 1 VIRUS
                        1                                       4



                     RECON

                                           3
                                                               Balanced Flow
                                                                     or
                       Data Exfiltration                       Data Exfiltration




 Word virus opens up a ftp connection with a server and upload documents.

                                                                                   50
                   Outline

1. Motivation and Terminology
2. PQS Approach
3. Implementation of a PQS detecting
  a. Phishing
  b. Data Exfiltration
  c. Covert Channel
4. Flow Attribution and Aggregation
5. Conclusion and Acknowledgments
                     Covert Channel
• A communication channel is covert if it is neither designed
      nor intended to transfer information at all. (Lampson
      1973)

• Covert channels are those that use entities not normally
     viewed as data objects to transfer information from one
     subject to another. (Kemmerer 1983)

            STORAGE                                TIMING
Information is leaked by hiding data Information is leaked by triggering
packet header fields: IP             or delaying events at specific time
identification, Offset, Option, TCP  intervals.
Checksum, TCP Sequence
Numbers.
                                                                       52
Covert Channel in Interpacket Delays




 SENDER                               RECEIVER
  We shall not                          We shall not
     spend
                   0              0        spend
a large expense    1              1   a large expense
     of time                               of time
   Before we       0              0      Before we
     reckon                                reckon
    with your
                   0              0       with your
  several loves,   0   INTERNET   0     several loves,
  And make us                           And make us
 even with you.    1              1    even with you.
   My thanes       0              0      My thanes
  and kinsmen,                          and kinsmen,
      …            1              1         …
                   0              0
                                                   53
 Binary Asymmetric Channel

1 0   0 0   0    1                  1   0   0   0   0   0



                                        ERROR: it should be a 1

                     
                     Pe
       1                                    1
            Pe
                           Noisy
                          Channel
            Pe
       0                                    0
                     
                     Pe
                                                             54
 Binary Asymmetric Channel Capacity
Capacity: Highest amount of information per symbol that can be
transmitted with arbitrarily small error probability.




                   Error
                   Probability

                                 Bit/symbols


                                                           24 hops.

 Received   Sent




                                                                 55
Statistical Detection                                                                delays




                                                 Number of packets
     = sample mean
N (  ) = # of packets with delay 
 N m ax = max # of packets with the same delay

 N ( )
         1        Covert channel
 N max
                                                                     Delay – tenth of a sec


 Level of confidence:                       N ( )
                                            N m ax


                   N  
                1
                   N max

Threshold used in the PQS experiments

                                                                                       56
                                                                                      bits
                                                             Sensor
For every traffic flow it registers the time delays between
consecutive packets.
                          source ip:       129.170.248.33
                   Key    dest ip:         208.253.154.210
                          source port:     44806
                          dest port:       23164              882 delays between 4/40sec and 5/40sec

                          Protocol:
                          TotalSize:
                          #Delays[20]:   3 0 0 16 882 2 0 17 698 2 0 0 1 0 1 0 0 0 0 0
 Attributes               Average delay:
                          Cmax;
                          Cmean:                    3 delays between 0sec and 1/40sec
Number of Delays




                                                                            Number of Delays
                                                                                               source ip: 129.170.248.33
                         source ip: 129.170.248.33
                                                                                               dest ip: 208.253.154.210
                         dest ip: 208.253.154.210
                                                                                               source port: 56441
                         source port: 56441
                                                                                               dest port: 23036
                         dest port: 23041




                                     N                                                                 N  
                                1          =0                                                       1          =1
                                     N max                                                                N max


                                   Delay – tenth of a sec                                        Delay – tenth of a sec
              Error rates and capacity
                    N  
  Confidence, 1 
                    N max

Error
Probability
                                                 N  
                                            1
                                                 N max
                              Bit/symbols




                             Capacity




                                                          58
           Detection-Capacity Tradeoffs
   is a discrete random variable.      A sample of      is denoted by

   Define a covert channel,      which has the same sample space as         namely
uses        in the sense that whenever               a covert message is communicated
  is a sample of

 Let                             and

          is the probability of FALSE ALARM

           S = {a, b, c, d, e, f, g, h}      Sample space

           D = {b, c, d}       Symbols used for covert communication



  The probability of D according the natural distribution of symbols is the
  false alarm rate.
              Detection-Capacity Tradeoffs
  Let     be the amount of information communicated by the covert channel        per

sample from       Define      to be the entropy of the distribution




Noting that                    by assumption, then




 The expected covert information communicated per sample is


         Covert Information



              False alarm
                                                                            60
                   Outline

1. Motivation and Terminology
2. PQS Approach
3. Implementation of a PQS detecting
  a. Phishing
  b. Data Exfiltration
  c. Covert Channel
4. Flow Attribution and Aggregation
5. Conclusion and Acknowledgments
      Flow Analysis = Data Reduction
                 Flow Aggregation

                                      EVENTS          Fewer events to be
                                      Hundreds        analyzed
                                      per hour

             Flow Attribution      FLOWS
                                    Thousands
                                    per hour

      Current Analysis   PACKETS

                         Hundreds of thousands per hour


How data move
                 BYTES
                                Billions per hour


                                                                       62
  Flow Attribution and Aggregation

  FLOW ATTRIBUTION                FLOW AGGREGATION

The final goal is to attribute   Recognizing that different flows
flows to people. Intermediate    (components), apparently totally
steps are a required part of     unrelated, nevertheless belong
the attribution process.         to the same broader action
                                 (event).
Uses logs that can explain a
flow as legitimate or            Views flows as components of
malicious.                       broader activities.

The goal is to explain flows.    The goal is to correlate flows
                                 based on certain criteria.


                                                                  63
                           Aggregation
Flow aggregation.                            Activity aggregation.
Recognizing that different flows,            Recognizing that similar activities
apparently totally unrelated,                occur regularly at the same time, or
nevertheless belong to the same              dissimilar activities occur regularly in
broader event (activity).                    the same sequence.
Flows are aggregated from captured           We correlate activities into activity
network packets.                             groups, patterns.
We aggregate flows into activities.          Examples:

Example:                                     • Nightly backups to all servers (each
                                             backup is an activity)
User requests a webpage (all DNS
and HTTP flows aggregated)                   • User requests a sequence of web-
                                             pages every morning.
           Packet = Aggregated Bytes
           Flow = Correlated Packets
           Activity = Correlated Flows
           Pattern = Correlated Activities
                                                                                  64
                    Web Surfing in Detail
  The browser breaks the URL into three parts: the protocol ("http"), the
    server name ("www.dartmouth.edu") and the file name (“index.html").

1. The browser communicates with a name server to translate the             A FLOW IS
   server name "www.dartmouth.edu" into an IP Address, which it uses to     INITIATED
   connect to the server machine.
2. The browser forms a connection to the web server at that IP
                                                                           A FLOW IS
   address on port 80.                                                     INITIATED

3. Following the HTTP protocol, the browser sends a GET request to the
   server, asking for the file "http://www.dartmouth.edu/index.html."
4. The web server sends the HTML text for the Web page to the browser.
5. The browser reads the HTML tags and formatted the page onto your
   screen.
6. Browser possibly initiates more DNS requests for media such as
   images and video.                                                      MULTIPLE
                                                                          FLOWS ARE
7. Browser initiates more HTTP and/or FTP requests for media.             INITIATED…
                                                                                  65
           Resulting Flows and Activity



                                      Flows in
                                      the activity


Activity




                                             66
    Activities and Flows
                                 UDP Flow
                                 TCP Flow




Activity




                           Long Flow




                                            67
               Complex Activities ....
                                       TCP portscan
                                                                  Regular UDP
                                                                broadcasts (NTP)




Correlated
Network
Flows
Within       System upgrade
a LAN




                   Regular browsing/
                   download behavior
                                                 UDP portscan




                                                                            68
                        Flow + Snort Alerts
Scenario: several packets in a flow triggered IDS alerts

Snort rule 1560
generates an alert
when an attempt
is made to exploit a
known vulnerability
in a web server or a                                                 SNORT
web application.                                                     ALERTS


Snort rule 1852
generates an alert
when an attempt is
made to access                                                        FLOW
the 'robots.txt' file
directly.


The flow can be characterized as malicious and further investigation must be done.
                                                                             69
                                Current focus
Theoretical approach for clustering aggregated flows.
                          Flow = As defined
                          Activity = Aggregated flows
                          Pattern = Correlated Activities

  Approach: Graph theory (flows are the nodes and the edges are between
  correlated nodes).
  We are thinking about defining a metric that captures the closeness
  between two different activities to allow grouping into patterns.

Activity 1.                  Activity 2.
                  x                            x

              y                            y
                      z                            Can they be grouped in one pattern?
       t                             t

  w           s                            s

                                                                                   70
                   Outline

1. Motivation and Terminology
2. PQS Approach
3. Implementation of a PQS detecting
  a. Phishing
  b. Data Exfiltration
  c. Covert Channel
4. Flow Attribution and Aggregation
5. Conclusion and Acknowledgments
                  Contribution

• Identification of a new generation of threats.

• Identification and implementation of approaches
       based on a Process Query System to detect
       them.

• Introduction and implementation of flow attribution
       and aggregation.




                                                        72
                Work in Progress

• Build a theory of flow attribution and aggregation.

• Develop a theory of tractability to characterize phenomenon
      in the sense of multi hypothesis tracking.

• Identification of new application domains

• Statistical theory of undetectable covert communication




                                                            73
                    Acknowledgements
       Active Members                             Alumni
George Cybenko                    Robert Gray (BAE Systems)
                                  G. Jiang (NEC LAB)
Alex Barsamian
                                  Naomi Fox (UMass, Ph.D. student)
Marion Bates
                                  Hrithik Govardhan, MS (Rocket Software)
Chad Behre
                                  Yong Sheng Ph.D. (Dartmouth CS postdoc)
Vincent Berk
                                  Josh Peteet, MS (Greylock Partners)
Valentino Crespi (Cal State LA)
                                  Alex Jordan, MS (BAE Systems)
Ian deSouza
                                  Chris Roblee, MS (Lawrence Livermore NL)
Paul Thompson
                                  George Bakos (Northrup Grumman)
Annarita Giani
                                  Doug Madory M.Sc. (BAE Systems)
                                  Wayne Chung Ph.D. (IDA/CCS)
                                  Glenn Nofsinger Ph.D. (BAE Systems)
                                  Yong Sheng Ph.D (CS Dartmouth College)

Research Support: DARPA, DHS, ARDA/DTO, ISTS, I3P, AFOSR, Microsoft
                                                                       74
    www.pqsnet.net
annarita@dartmouth.edu




                   Thanks!
                             75

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:10/6/2011
language:English
pages:75