New Directions in Network Intrusion Detection by HC120809085029


									New Directions
  Presented to CS694
   October 16, 1998
     Jeremy Elson
does security matter?
 Would you care if someone could:

 •   Crash your computer every 5 minutes?
 •   Subtly change your thesis?
 •   Change your bank statement?
 •   Sink a ship?
     – (Think it’s unlikely? It happened; see NY
       Times, vol. CXLIII, page E7 - Oct. 24, 1993)
     when will it matter?
    Some systems and/or protocols are designed with security
    in mind from the beginning -- maybe even as their primary
    design goal. But for most? The story’s the same…

•   Protocol design? (Nah, that’s an application problem)
•   Application design? (We plan to add that in the future...)
•   Application deployment? (Let’s get it running first)
•   System administration? (I’m putting out fires every day!)
               houston, we have
                 a problem...
Type of site       # of hosts   Total %      % Yellow % Red
                   scanned      vulnerable
banks              660          68.33        32.73            35.61
credit unions      274          51.09        30.66            20.44
US federal sites   47           61.70        23.40            38.30
newspapers         312          69.55        30.77            38.78
sex                451          66.08        40.58            25.50
Totals             1734         64.94        33.85            31.08

       Yellow = Probably Hackable; Red = Trivially Hackable

Source: Independent Security Survey By Dan Farmer, Dec 1996
system vulnerabilities
• Almost all vulnerabilities come from bugs
  in the implementation of, or
  misconfigurations of, the OS and/or apps
• Rarely, a problem with a protocol itself
• Vulnerabilities can lead to:
  – Unauthorized access: attacker gains control of
    the victim’s machine (attacker can log in, read
    files, and/or make changes to the system)
  – Denial of Service against host (attacker can
    crash the computer, disable services, etc.)
  – Denial of Service against network (attack can
    disrupt routing, flood the network, etc.)
            security incidents
             reported to CERT

              es 9 7
                   Source: CERT/CC --
           who is the enemy?
          • The Troubled Genius
            – Has a deep understanding of systems
            – Capable of finding obscure vulnerabilities in
              OS’s, apps, and protocols, and exploiting them
            – Extremely skilled at evading countermeasures
            – Can dynamically adapt to new environments
          • The Idiot
            – Little or no true understanding of systems
            – Blindly downloads & runs code written by T.G.
            – Can usually be stopped by calling his mother
          Who do you think causes more damage?
• The idiots collectively cause more damage
  because there are a vast number of them
• Every security incident I analyzed while I
  was at NIH was the work of an idiot
• Every time smart hackers find a new security
  hole, they make it public -- they have a
  publish or perish “ethic”
• Each time, hordes of idiots pounce on it and
  break into every system they can find
publish or perish
or, good help is not hard to find
 the never-ending game
1. New bugs are found; exploits are published
2. Hordes of idiots cause damage using those exploits
3. Vendors are pressured to come out with fixes
4. Users install the fixes (sometimes? rarely?)
5. Go to step 1.
            The big questions are:
1. How can we protect a large site? (The site is only as
   strong as its most poorly administered machine.)
2. How can we pro-actively protect against attacks that
   we have never seen before, to avoid Step 2 damage?
   the rest of my talk
• Bro, Vern Paxon’s network security
  monitor, attempts to get a handle on site-
  wide security monitoring in a way that
  firewalls can’t -- more on that soon
• Computer Immune Systems, from Stephanie
  Forrest at UNM, attempts to solve Problem
  2 -- but only on a host, not for an entire
• My idea: how to combine the best of both
  securing your system
  the quick & easy way
“It’s easy to run a secure computer
  system. You just have to disconnect all
  dial-up connections and permit only
  direct-wired terminals, put the machine
  and its terminals in a shielded room,
  and post a guard at the door.

                 F.T. Grampp and R.H. Morris

(Great! Let’s go to the router room with some bolt cutters.)
  (not as good as bolt cutters, but…)
• Routers: easy to say “allow everything but…”
• Firewalls: easy to say “allow nothing but…”
• This helps because we turn off access to
  everything, then evaluate which services are
  mission-critical and have well-understood risks
• Note: in my opinion the only difference
  between a router and a firewall is the design
  philosophy; do we prioritize security, or
  connectivity/performance? (configurability, logging)
typical firewall setup
evil Internet

        DMZ                                      internal network

    Diagram courtesy of CheckPoint Software Tech,
     the firewall setup
• Firewall ensures that the internal network and
  the Internet can both talk to the DMZ, but
  usually not to each other
• The DMZ relays services at the application
  level, e.g. mail forwarding, web proxying
• The DMZ machines and firewall are centrally
  administered by people focused on security
  full-time (installing patches, etc.); it’s easier
  to secure 20 machines than 20,000
• Now the internal network is “safe” (but not
  from internal attacks, modems, etc.)
      firewall politics
• In a corporate environment, firewalls are
  great. The network user is an employee of the
  network service provider; it’s in the
  provider’s power to say “Thou Shalt Not Use
  Any Internet Services Except For These...”
• How well do you think that would work here?
  Dear Professor _____________ ,

  Our firewall has detected your attempt to use the network
  protocol ________. This protocol is not supported under
  the USC Security Policy. Please cease these activities at
  once. Any further infractions will result in your
  disconnection from our network.
big brother is watching
• “Bro” passively monitors the network at
  some key location (say, the border router)
• Reconstructs flows and searches for known
  “attack signatures” -- a manually created
  database, based on known network attacks
• Provides real-time notification of security
  personnel when it sees something suspicious
• Future versions may actively terminate
  connections by sending forged TCP RST
        thoughts on bro
+ It provides a nice site-wide view of security
+ It’s not disruptive to users
+ It’s centrally administered
– Unlike a firewall, which stops badness before
  it starts, bro’s alarm may come too late
– It can’t flag attacks that are not in its database
  of known attack signatures
– It can not reliably determine what an end-
  station is seeing, for a variety of reasons
         subverting bro
  (we’ll start with the easy ones)

• Let’s say we’re trying to scan for the string
  “su root”.
  – What if I type “su me^H^Hroot”?
  – What if I type “su<telnet option> root”?
  – What if I type “alias blammo su”, then type
    “blammo root”?
        reconstructing flows
      • Let’s say you want to search for the text
        “USER root”. Is it enough to just search the
        data portion of TCP segments you see?
                            USER root

            TCP:   HDR     USER     HDR    root

IP:     HDR HDR US       HDR ER       HDR HDR ro       HDR ot

       (Uh oh… we have to reassemble frags and resequence segs)
       fun with fragments
                  Imagine an attacker sends:

                 1.    HDR HDR US

                 2.    HDR ER
                 3.   1,000,000 unrelated fragments
                 4.    HDR HDR ro

                 5.    HDR ot
Think of the entire campus as being a massively parallel computer.
That supercomputer is solving the flow-reconstruction problem.
Now we’re asking a single host to try to solve that same problem.
          more fragment fun
                  Imagine an attacker sends:
                                                  Seq. #
    1.    HDR HDR US

                      HDR ER                     Time

    3a.                     HDR HDR ro

    3b.                     HDR HDR fo

      4.                                 HDR ot
Should we consider 3a part of the data stream “USER root”?
Or is 3b part of the data stream? “USER foot”!
-- If the OS makes a different decision than the monitor: Bad.
-- Even worse: Different OS’s have different protocol interpretations,
so it’s impossible for Bro to agree with all of them
• Non-standard parts of standards
   – IP fragment overlap behavior
   – TCP sequence number overlap behavior
   – Invalid combinations of TCP options
• Other ways to force a disparity between the
  monitor and the end-station
   – TTL
   – Checksum
   – Overflowing monitor buffers

See for detailed examples
         is bro useless?
• Of course not.
  – Remember, most of the problems are caused by
    idiots; they don’t know how to subvert bro and
    the techniques can’t be pre-packaged easily.
• It doesn’t cost us anything.
  – Just because the monitor can be subverted
    doesn’t mean we can’t use it. Using it doesn’t
    mean we are making a tradeoff, so why not?
• There is no silver bullet.
  – Don’t expect any system to single-handedly
    “solve” the security problem. Take what you
    can get.
  the reverse approach
• Systems like Bro define “bad” -- anything
  they don’t recognize, therefore, is assumed
  to be good.
  – Problem: Your “bad” list is always out of date
• Other systems attempt to define “good” --
  anything they don’t recognize is “bad”
  – Now, new badness is automatically caught!
  – Problem: How do you define “good”?
     the immune system
• Stephanie Forrest et al were inspired by
  biological immune systems.
  – A biological immune system doesn’t have a
    catalog of all viruses that exist in the world
  – They have a strong sense of “self”, allowing
    them to identify and attack non-self entities
• Is the same thing possible on the computer?
• Motivation: UNIX processes; there is a
  well-known and easy technique for getting
  almost any buggy program to execute
  arbitrary code by just sending it carefully!
getting to know yourself
1. Find a metric that characterizes the system.

2. Build up a database of normal values for
   that metric when the system is working as it

3. Continually monitor the metric; set off an
   alarm if it deviates from the database.

4. Test the metric for false positives/negatives.
   applying the method
• First target application: sendmail (infamous
  for many security holes)

• First metric: system call traces

• Normal database to be built up by recording
  sendmail’s behavior in a wide variety of
  everyday tasks (many types of messages)
 system call traces
               Sample call sequence:
open, read, mmap, mmap, open, getrlimit, mmap, close

call          call+1        call+2         call+3
open          read,         mmap           mmap
read          mmap          mmap           open
mmap          mmap,         open,          getrlimit,
              open,         getrlimit      mmap
getrlimit     mmap          close
database in training
   the normal database
• Using a window size of 6, running sendmail
  through its paces produced a database of
  only 1500 entries and was stable!
• This is only 5x10-5% of all possible entries
• The small size of the database is critical:
  – Big database = variability in “normal” =
    difficulty in detecting anomalies
  – Big database = no realtime monitoring
            Anomaly                   %       Num
            sunsendmailcp             4.1     95
                    remote 1          4.2     470
                    remote 2          1.5     137
                    local 1           4.2     398
                    local 2           3.4     309
            decode                    0.3     24
            lprcp                     1.4     12
            sm565a                    0.4     36
            sm5x                      1.7     157
            forward loop              1.8     58
(sm565a and sm5x were unsuccessful attacks; forward loop was
nonmalicious anomalous behavior; others were successful breakins.)
• Programs seem to exhibit remarkable amounts
  of find-grained consistency when operating
  normally; this can be used to detect anomalies.
• Since we now know what’s “good”, we can
  report badness that we have never seen before
• Will not help to do things like determine that a
  user has stolen another user’s password
• A solution for one host, not an entire site
• Current system runs off-line (on-line planned)
           related work
• Various expert systems for analyzing logs
  – Systems remain vigilant even given megs of log
    data every day, where humans throw away data
  – Defines a set of events (e.g. directory
    modification, password file access, etc.)
  – Complex statistical algos for reporting anomalies
    while still adaptively learning new user behavior
• Keystroke Dynamics - knows how users type
        bringing it all
• Bro is powerful in that it can monitor an
  entire site, but weak in that it can’t predict
  what future attack profiles will look like
• Forrest’s work, and other systems mentioned,
  all suggest you can do well by adaptively
  learning “normal” and reporting deviations
• Forrest’s work shows that surprisingly high-
  level characteristics of a system can become
  evident by looking at events on an extremely
  low level, fine grain, and small time scale
                 my idea
• Based on motivations mentioned in the
  previous slide, I propose a new type of
  network intrusion detector:
  – Monitors network traffic at the packet level
  – Creates per-flow packet traces similar to system
    call traces (e.g. SYN -> SYNACK -> ACK;
    ACK -> DATA -> ACK)
  – Uses various other metrics (e.g. % of total
    traffic that is SYN, ACK, RST; ratio of ACKs
    to data; packet size distribution; distribution of
    source and destination port numbers)
  – Adaptively learns what is “normal” for both
    traces and other metrics; reports abnormalities
       more on my idea
• I think it would capture a wide variety of
  hard-to-see protocol-bug-based attacks
  – SYN Flood, Land, Teardrop, Smurf, plus (most
    importantly) whatever hasn’t been invented yet
• Would probably see attacks on services
  (e.g. port scanning on a host, service
  scanning across many hosts -- DNS bug!)
• Would even see deviations from normal
  behavior on regularly used services (e.g.,
  catching a PHF bug or keystrokes to httpd)
 problems with my idea
• Still not a useful way for finding things like
  stolen passwords
• The variations in protocol implementations
  on the Net may mean that normal behavior
  will not exhibit self-similarity
• Might miss things that could be more
  reliably detected by a pattern-matcher -- but
  why not run Bro and SIS at the same time
  (contrived acronym: Segment Initiated Security)
• Probably a significant effort to build and
  characterize the system and I don’t have the
  time to do it :-)
that’s all, folks!

      Thank You!
   backup slides
answering questions
 it hasn’t leveled off
• I think the growth has remained exponential,
  although CERT’s reports flattened in 1995
• People nowadays don’t know what CERT is
• People don’t report incidents
  –   It’s time-consuming
  –   It gets you in trouble with your boss
  –   It’s embarrassing
  –   It’s proprietary information
        the smurf attack
 Typically: evil has slow link (modem)
             victim has fast link (T1)
             big has very fast link (T3+)     victim

 Source: victim
                                               Source: big
    Dest: big                  big             Dest: victim
(broadcast addr)
      buffer overflows
        on the stack
                    func 2’s address      func 1’s address

           buf                     c, d             a, b

func_3()                       func_2()         func_1()
{                              {                {
   char buf[100];                 int c, d;        int a, b;
    read_user_input(buf);          func_3();         func_2();
}                              }                }
        buffer overflows
          on the stack
                       buf’s address
                     func 2’saddress        func 1’s address

         buf                         c, d             a, b

func_3()                         func_2()         func_1()
{                                {                {
   char buf[100];                   int c, d;        int a, b;
     read_user_input(buf);           func_3();         func_2();
}                                }                }

    Attacker is supplying input to buf… so buf gets a very
    carefully constructed string containing assembly code,
    and overwriting func 2’s address with buf’s address.
    When func3 returns, it will branch to buf instead of func2.

To top