Mar06 by liwenting


									The dark world of Security…
   Scalability, Fidelity &
Containment in the Potemkin
     Virtual Honeyfarm

   Presenter: Raghavan Srinivasan
Overview of today‟s presentation…

•   Internet malware.
•   Objective – solution.
•   Honeypots – a brief introduction.
•   Honeyfarms.
•   Potemkin – related work.
•   Tricks.
•   Architecture.
•   Questions & discussion.
Internet malware – compromising
large number of hosts…
• DDos Extortion.
• Online identity theft.
• Phishing - attempts to acquire passwords
  and credit card details, by masquerading
  as a trustworthy person or business.
• Piracy.
• Gathering intelligence on new malware –
  monitoring activity at a large scale or
  capturing behavior with high fidelity.
• Scalability ( ~6 ).
• Close emulation of the execution of the
  individual internet hosts.
• Potemkin – a prototype honeyfarm system
  that exploits
    • Virtual machines.
    • Aggressive memory sharing.
    • Late binding of resources.
Honeypots – a brief introduction…
• Honeypots have emerged as the principle
  tool for gathering intelligence on new
  means & methods used by attackers.
• Open to attacks.
• Placed carefully so as to attract attackers.
• Types
  – Low interaction.
  – High interaction.
• What is a honeypot ?
  – A honeypot is an information system resource whose
    value lies in unauthorized or illicit network-connected
    system that is carefully monitored and frequently left
    unprotected in order to detect & analyze intrusions.
• Use of honeypots
  – Create antivirus signatures – to limit further growth.
  – Develop disinfection algorithms – eradicate existing
  – catch the attacker red handed.
• Honeyfarm – large network of honeypots.
• Scalability, Fidelity & Containment
• Containment – preventing compromised
  honeypots from attacking third-party

Low interaction              High interaction
  honeypots                    honeypots

 { scales, low                { low scalability
    fidelity }               – high cost, high
                                  fidelity }
Honeyfarm system architecture…
• Scale to design points previously reserved
  for stateless monitors.
• Offering fidelity qualitatively similar to high-
  interaction honeypots.
• Key ideas :
  – Dynamically bind physical resources to
    external requests only for short periods of
• Based on a specialized network gateway
  and virtual machine monitor derived from
• Each flow at the network layer is
  dispatched to the honeyfarm.
• Each server in turn creates a new VM for
  every flow emulating a different IP for each
  flow – it distinguishes each flow distinctly.
Potemkin contd…
• Problem: number of VM‟s running.
• Solution:
  – Flash cloning – copying & modifying the
    reference image.
  – Delta virtualization – lazy initialization –
    memory coherence.
Related work…
• Telescope – passive monitor – cannot complete
  the TCP handshake – cannot analyze the attack.
• Honeypots employ active responders & reply to
  inbound transactions – problem overcome –
  stateless implementation & hence scalability is
• Provos – a low interaction honeypot.
• High interaction honeypots present an
  environment by which they emulate real users to
  a maximum extent & thus are able to monitor the
  attacker with greater fidelity.
• A server for each IP connection.
Problem of scalability…
• Virtual machines to the rescue.
  – VMware.
  – Xen.
  – Virtual PC.
  – User mode Linux – Is this similar to
    application process threads?
Advantages of VMMs…
• Easy to manage.
• Individual VMs can be loaded, frozen or stored
  on demand.
• Reduces the server deployment cost.
• Crash of a VM does not mean crash of the entire
  physical system, since other VMs are still
  running and there are no interactions between
  the VMs.
• Real time systems
  – 8 VMs per physical host.
  – Scales up to 2000 IP hosts – upper bound.
Architecture overview…
• Objectives that primarily shape the architecture
   – Scalability.
   – High interaction fidelity.
   – Containment.
• Scalability
   – Placing of the honey pots is very important.
   – Honeypots as a standalone implementation has very little
     significance – external input improves the efficiency of the
   – Most of the honeypots processor cycles are wasted idling waiting
     for an external trigger.
   – Most of the honeypots memory is idle as well and hence wasted.
   – Duplication of data and state.
Architecture – scalability addressed…

           A specialized        Dynamic
 Network   gateway router       assignment of
 packet                         IP to the
 arrives                        packet
            Virtual honeyfarm
Architecture – scalability addressed…
                  Packet with a IP address

                                   A lightweight VM
                                   from a reference
         Flash cloning

                                    lazy initialization
      Delta virtualization              based on
                                      divergence –
                                   memory coherence.
Virtual machine monitor…
 • Each IP address spawns a new VM.
 • Provides isolation.
 • Expensive if implemented naively.
 • Gateway preserves heterogeneity virtually.
 • Each honeypot server is homogeneous.
 • Hence VM can be merely created by
   copying - flash cloning.
 • Copy-on-write semantics - delta
Flash cloning…
Delta virtualization…
• VMs can be quickly killed after an attack
  as most attacks are unsuccessful.
• What if a VM is compromised ?
• Honeyfarm – incubator for malware – third
  party liability.
• Trivial soln. – no outgoing packets.
• Honeywall system – outgoing only in
  response to incoming packets.
• Proposed soln. – gateway router policy.
Gateway router policy…
• Track communication patterns.
• Scrub std. outbound service – E.g. DNS request.
• Centralization
  – increases load.
  – But simplifies mgt. & policy specialization – E.g.
    internal reflection.
• Universal identifier that captures the causal
  relationships of communication.
• Infection can spread inside the honeyfarm and
  simulate the internet but does not allow the
  infection to spread because of the gateway
Gateway router policy - containment…

                             Wy      X
       VM Wy
       Wy Wx                   Wx
        Wx      router
       VM Wy

  honeyfarm              Cross contamination –
                          distinct contagions
Solution – universal identifier…
• Additional address aliasing mechanism.
• Captures causal relationships of comm.
• External node „X‟ sends a packet „P‟ to the
• Universal identifier – UPX .
• Packets within the same universe can only be
• Captures symbiotic, inter-worm behavior by
  designating “mix-in” universe – sounds like
  explicit policies need to be written ?
Gateway router – brain of the system…
 • Functions:
   – Direct incoming traffic to individual honeyfarm
   – Manage the containment of outbound traffic.
   – Implement long-term resource mgt. across
     honeyfarm servers.
   – Interface with detection, analysis and user-
     interface components.
Inbound traffic…
 • Attracts inbound traffic through
   – Routing – visible to traceroutes
   – Tunneling – invisible to traceroutes.
 • Reducing the load on the gateway
   – Network port scanning.
   – Implementation of filters.
   – Pattern matching algorithms.
   – Elimination of NAT load.
Outbound traffic…
 • Has to apply the same containment policy
   to all.
 • “Fast-spread” contact policy.
 • Gateway needs to support a DNS
   mechanism for the honeypot servers
   inside the system.
   – Either implement a DNS server itself
   – Proxied to a dedicated DNS server.
 • Only a subset of the servers are dedicated
   to internal reflection.
Resource allocation & detection…
 • Easy to decide when a VM has to be created.
 • Difficult to decide when to reclaim.
 • When an attack becomes unsuccessful –
   remove resources from the VM.
 • VMs attacked need to persist in order to further
   analyze, log & manipulate.
 • Some factors
   –   Load balancing.
   –   Resources are low.
   –   Frozen to secondary storage.
   –   Which can continue execution.
 • Paper unclear on the issue of reclaim…
Evaluation contd…
Evaluation contd…
Voids – admitted by the paper itself…
 • Attract traffic
   – Spreading along application specific
   – “Honey monkey” coming soon…
 • Honeypot detection
   – Agobot strains.
   – Current X86 arch. doesn‟t support complete
   – Static IP addresses – dynamic assignment –
   – Delayed response reveals existence of virtual
Voids – contd…
 • DOS attacks
 • Requires careful allocation of resources.
 • Assumption – address space is not
   exhausted very quickly.
 • My opinion
   – Comprehensive.
   – Exhaustive.
   – Addresses several issues.
   – Some issues have been left unsolved –
     potential for some research….
• How are honeypots carefully provisioned?
• Attacks specially designed for VM that can
  penetrate through the VM and disrupt the
  underlying OS?
• Has there been a venture to set up a similar
  honeyfarm in a distributed fashion instead of
  placing them geographically close to each other ?
  What are the special issues that need to be
  considered in this case ? – Is HoneyMonkey this
  distributed approach ?
• Secure & robust threads instead of VMs ?
Automated Web Patrol with
  Strider HoneyMonkeys
•   Problem @ hand.
•   Proposed solution.
•   Browser based vulnerabilities.
•   The HoneyMonkey system.
•   Evaluation.
•   Questions & Discussion.
Problem @ hand…
• Several attacks exploit browser vulnerabilities
  and install malware software.
• E.g.
• Download.Ject
• Bofra
• Current state –manual analysis
• Unable to scale.
• Do not provide a comprehensive picture.
Proposed solution…
• Active, client-side, VM – based honeypots
  called Strider HoneyMonkey.
• Performs large-scale, systematic &
  automated web patrol.
• Uses monkey programs of various OS
  level patches to mimic human browsing.
• Adopts a state-management methodology.
• Use of Strider Tracer.
Browser based vulnerability exploits…



Code obfuscation…
• Dynamic code injection – document.write()
  function inside a script.
• Unreadable code – decoded using
  unescape() function.
• Custom decoding routine.
• Substring replacement using replace()
URL redirection…
• Primary URL             Secondary URL
• Protocol redirection using HTTP 302
  temporary redirect.
• HTML tags.
• Script functions including
Vulnerability exploitation…
• Exploiting of multiple browser
• Owing to its popularity IE is attacked a lot.

Malware installation…
• Introduce some piece of arbitrary code on
  the victim machine in order to achieve a
  larger attack goal.
HoneyMonkey system…
• Automatically detect and analyze a
  network of websites that exploit browsers.
Exploit detection system…
• Stage 1 – scalable mode by visiting N-
• Stage 2 – perform recursive redirected
• Stage 3 – scan exploit URLs using fully
  patched VMs.
Exploit detection - XML report…
• Executable files created or modified
  outside the browser sandbox folders.
• Processes created.
• Windows registry entries created or
• Vulnerability exploited.
• Redirect-URLs visited.
Redirection analysis…
• Stage 1 – act as front end content
• Traffic redirection – tracked with a BHO –
  Browser Helper Objects.
• Recursive scanning.
• Construction of topology graphs based on
  traffic redirection.
• Identify web pages that actually perform
  the exploit and stop redirection.
Topology graphs…
Anti-Exploit Process…
• Generating Input URL Lists – source
  – Suspicious URLs for analysis.
  – Popular web sites – if attacked can potentially infect
    a large population. (measured search engines).
  – URLs of more localized scope – within organizations
    or based on history etc…
• Acting on output exploit-URL data
  – Stage 1 – output-exploit-URLs.
  – Stage 2 – output-traffic-redirection topology graph.
  – Stage 3 – output-zero-day exploit URLs & topology
Node ranking…

  Connection                no. of
    counts               exploit URLs
Node ranking contd…
Zero day exploit detection…
• Important observations:
  – Monitoring easy-to-find exploit-URLs is effective.
  – Monitoring content providers with well known URLs is
  – Monitoring highly ranked & advanced exploit URLs is
• Identifying HoneyMonkeys
   – Targeting HoneyMonkey IP addresses.
   – Performing a test to determine if a human is present.
   – Detecting the presence of a VM or the HoneyMonkey
   – Detecting the presence of a VM or HoneyMonkey
• Exploiting without triggering HoneyMonkey
  detection – code within browser sandbox.
• Randomizing the attacks.
• VSED – vulnerability specific exploit detector.
•   Automatic.
•   Scalability.
•   Non-signature based approach.
•   Stage-wise.
•   Zero-day exploits.
1) Server based.             1) Client based.
2) Based on VMs.             2) Based on VMs.
3) Greater interaction.      3) Better scalability.
4) Stateless approach.       4) Stateful approach.
5) Can handle multiple       5) More suitable for
  attacks as it is able to     individual attacks.
  analyze the same.
 The world of Security
continues to be dark…

To top