Design of x86 Emulator for Generic Unpacking

Document Sample
Design of x86 Emulator for Generic Unpacking Powered By Docstoc
					Design of x86 Emulator for
Generic Unpacking

         Chandra Prakash
The problem
   Large number of detections are still
    based on some static signature, e.g.,
    MD5, CRC32 etc.
   Malware has cleverly evolved to evade
    signature based detections by use of
The problem, contd…
   It is possible to write custom packing
    routines for each packer
   Cryptanalysis or X-Ray can also be used
   But, the number of packers and
    variations within each packer type are
    too many, e.g., Current version range
    for UPX is 1.x–3.x and FSG is 1.x-2.x
   Moreover, there can be recursive layers
    of packing done
A Solution - Emulation
   Due to nature of the problem, it is
    desirable to have a general purpose
   Emulation provides a “fairly” general
    purpose solution that leads to the term
    Generic Unpacking
What is Emulation?
   Wikipedia definition is pretty clear
       “An emulator duplicates (provides an emulation
        of) the functions of one system using a different
        system, so that the second system behaves like
        (and appears to be) the first system. This focus on
        exact reproduction of external behavior is in
        contrast to simulation, which can concern an
        abstract model of the system being simulated,
        often considering internal state.”
Emulation –
where else is it used?
   Supporting cross-platform applications
   Controlled and secure execution of un-
    trusted applications
   And off course, Dynamic behavioral
    analysis of malware and packed
    malware detection via generic
   Etc.
Emulation – to what degree?
   Full emulation – Emulate everything;
    Application as well as the Operating System
       E.g., VMWare and VirtualPC
   Application Only - Emulate application level
    instruction set and System Call interface
       E.g., Wow64, Win32 emulation on 64-bit Windows
   Our emulator for Generic Unpacking is
    Application Only
Emulator Components
   A software implementation of the subset of
    hardware, operating system and application
    environment needed for running an application.
   The hardware components include: the CPU,
    registers, interrupt vector table. The operating
    system components include: PE loader, virtual
    memory manager, structured exception
    handling(SEH). The
    Application environment include: input parameter
    and environmental variable support, heap, stack,
    process environment block(PEB), thread information
    block (TIB), function hooks for spoofing execution
    references into system dll(s)
Emulator Components                                                                     SEHHandler


   +loadPEImage()                                                                                            «struct»
   +createProcess()                                                  X86CPU                                X86Registers

                         MemoryManager                    +executeOneInstruction()

                         +readByte()                                                      HookList
                         +writeByte()                                                                              «struct»
                         +virtualAlloc()                                                                        PEB_LDR_DATA


    PELoader                                                                                   1
                                                                          +run()               1
                                                                                   1               1
  +loadImage()                                                           1
                     *                               +heapDestroy()

               +readByte()            1                                                                1
                                                 -startPage                                            1
                                             *   +readByte()
                         Stack                   +generateAccessViolationException()
Emu Components - PE Loader
   The very first step in a target’s emulation
   Create a memory-mapped image as per
    Windows PE specifications.
   Calculate virtual mapped size
   Allocate contiguous buffer based on virtual
    mapped size and the copy PE headers and
    section data in aligned sections
   Fix imports from primary module
   Fix relocations
Emu Components - Registers
   There are eight 32-bit general purpose
    registers (EAX, EBX, ECX, EDX, EBP, ESP,
    ESI, EDI)
   Six 16-bit segment registers (CS, SS, DS, ES,
    FS, GS), DR0-DR3, DR6, DR7 hardware
    debug registers
   EFLAGS and EIP registers
   Added benefit to also provide support for FPU
    instructions and extensions to x86
    architecture, such as MMX, SSE, SSE2,
    SSE3[10] and 3DNow! instructions
Emu Components - CPU
   Fetch instructions from the virtual
    memory address space of the target
   Decode instruction; find instruction
    type, get operands
   Execute instruction; calculate results
    and store
   Move on to the next instruction as
    indicated by EIP
Emu Components – Interrupt
   INT N generates interrupt, with N range
    as 0-255
   Execution of INT N results in a software
    exception in the application
   From user mode only a subset of these
    are allowed, all others result in access
    violation exception
Emu Components – Interrupt
User mode Exceptions for INT N as noted on
Windows XP-SP2
Interrupt Number       Exception thrown
3, 2d                  Breakpoint

4                      Integer Overflow

2a, 2b, 2c, 2e         None

All others             Exception Violation
Emu Components – Virtual
Memory Manager
   Manages Virtual Memory used by the target at the
    very lowest level
   Maintains memory regions
       Each region consist of a contiguous sequence of pages, e.g.,
        PE image region
   Each page has its own allocation and protection
       Allocation type include reserved, committed and free
       Protection type include read, write, execute etc.
   Access violation generated when a memory reference
    is not compatible to the allocation and protection
    type for the region
SEH handling
   Most commonly used to obfuscate
    execution path by deliberate generation
    and handling of software exceptions.
   Typically used instructions are:
       Single step (INT1) and break point (INT3)
       Arithmetic divide or integer overflow
        exceptions that are generated by DIV/IDIV
        and INTO instruction.
   The stack is a contiguous memory region that
    serves among other things as a memory work
    area for parameters passed in function calls
    and SEH chain.
   There exists one stack for each thread.
   It is implemented in an inverted manner so
    that it grows in the direction of decreasing
    memory address.
   The stack parameters, e.g., base, limit,
    address of top level exception handler frame,
    should be appropriately set in TIB
   Heap enables efficient memory allocations of
    much lower granularity as opposed to page
    granular allocations of VirtualAlloc call.
   To support Win32 heap related calls made by
    the target, e.g., HeapAlloc, HeapFree, etc., a
    simulation for the same needs to be provided.
   The heap is implemented as a wrapper
    around page granular memory allocation
Thread Information Block(TIB)
   For each thread there is a TIB structure stored at the address indicated
    by FS:[18h] in each thread.
           +0x000 ExceptionList : Ptr32 _EXCEPTION_REGISTRATION_RECORD
           +0x004 StackBase        : Ptr32 Void
           +0x008 StackLimit      : Ptr32 Void
           +0x00c SubSystemTib       : Ptr32 Void
           +0x010 FiberData       : Ptr32 Void
           +0x010 Version        : Uint4B
           +0x014 ArbitraryUserPointer : Ptr32 Void
           +0x018 Self         : Ptr32 _NT_TIB

   The first field ExceptionList in TIB contains address of the top level
    exception handler frame represented by
   StackBase and StackLimit contain lower bound and upper bound of the
    thread’s stack.
   Address of PEB can be obtained as FS:[30h]
Process Environment Block
   For each user mode process there is one PEB
   Some of the important fields accessed by malware are:
    BeingDebugged, ImageBaseAddress, InLoadOrderModuleList,
    InMemoryOrderModuleList and InInitializationOrderModuleList
   The IsDebuggerPresent Win32 API simply returns value in
    BeingDebugged field of PEB. This is used by malware to detect
    debugger’s presence as one of the anti-debugging tricks
         0x002 BeingDebugged   : UChar //In PEB
   The sorted list of modules is maintained in three different
    LIST_ENTRY type data structures in PEB_LDR_DATA
      +0x00c InLoadOrderModuleList : _LIST_ENTRY
      +0x014 InMemoryOrderModuleList : _LIST_ENTRY
      +0x01c InInitializationOrderModuleList : _LIST_ENTRY
Function hooks
   In application-only emulator, any system call made
    by malware in a dependent system module like
    kernel32.dll is intercepted and its corresponding
    spoofed implementation provided
   Some of the functions include: LoadLibraryA/W,
    GetProcAddresss, GetModuleHandleA/W, VirtualAlloc,
    VirtualFree, HeapAlloc, HeapFree, GetVersionExA/W
   Also a default un-emulated function hook should also
    be provided that gets called when an un-
    implemented import function is encountered
Stop Conditions
   Ideally emulator should be stopped at OEP
   Finding exact OEP in a generic way is non-
   Typical conditions other than the target
    initiated explicit termination are:
       Encountering an un-emulated system call in a
        dependent module.
       Unhandled exception for which no SEH handler
        was found. Some of these exceptions include
        invalid memory read, write, execute, divide by
        zero, integer overflow.
Stop Conditions…Contd
   Encountering an un-emulated or illegal
   A configured timeout.
   Maximum number of instructions being
   Attempt to load a dll that could not be
   Too many dlls being loaded by the target in
    explicit load module.
Emulator fine tuning due to
malware unique characteristics
   Practical constraints due to performance
    optimizations and undocumented features would
    allow only limited implementation of the emulator.
   Once the core emulator system is ready, developing
    a robust emulator is an iterative process driven by
    minor fine tuning of it for unique characteristics of
    supported packers and symptoms exhibited by the
    malware test-bed.
   Examples that follow describe some of the cases
    experienced with malware samples that lead to the
    improvement of our emulator.
   The cases described in these examples are no way
Example 1 – Setting Initial Stack

        0041C25A   CALL 0041C25F
        0041C25F   PUSH EBP
        0041C260   MOV EBX,DWORD PTR SS:[ESP+8]
        0041C264   MOV EBP,DWORD PTR SS:[ESP+4]
        0041C268   SUB DWORD PTR SS:[ESP+4],1A4AF

   At address 0041C260, the MOV instruction references
    an address ([ESP+8]) at the top of initial stack.
   This address is the return address after the CALL
    instruction in kernel32.dll that “calls” the malware
    entry point.
   The return address actually ends up calling
Example 2 – Module load
address alignment
        004A1584 MOV EBX,DWORD PTR SS:[ESP+24] ; EBX=77E8141A
        004A1588 AND EBX,FFE00000 ; EBX=77E00000
        004A16C4 ADD EBX,10000
        004A16CA JE SHORT 004A16F7
        004A16CC CMP WORD PTR DS:[EBX],5A4D
        004A16D1 JNZ SHORT 004A16C4
   At 004A16C4 EBX value is incremented by system allocation
   At 004A16CC it compares the content of value located at the
    address in EBX with WORD type 5A4D (ascii ‘MZ’), which is the
    startup marker for a PE image.
   If the address of the startup marker is found in the address
    pointed by EBX, execution follows to location 004A16C4.
Example 3 – Startup Register
       31428200 PUSH ED01C390
       31428205 MOV EAX,ESP
       31428207 CALL EAX
       0012FFC0 NOP
       0012FFC1 RETN
       31428209 XCHG EAX,EBX ; EAX=7FFDF000,
       3142820A POP EBX
   At 31428209 EBX is referenced whose value
    is equal to the PEB address of the program
Example 4 – Handling DLL
   For the correct emulation of a dll,
    before the entry point function DllMain
    gets called, its input parameters must
    be set in the stack as in Windows.
       BOOL WINAPI DllMain(
         HINSTANCE hinstDLL,
         DWORD fdwReason,
         LPVOID lpvReserved
Example 5 – Setting register
values before calling SEH handler
         0048C093   CMP AL,4
         0048C095   JNZ SHORT 0048C09B
         0048C097   NOP
         0048C098   NOP
         0048C099   RETN

   SEH handler’s second instruction at 0048C095 has a conditional
    jump instruction depending on whether AL is zero or not.
   In real Windows, EAX is set to zero just before SEH handler gets
   Therefore, before SEH handler gets control other registers
    should be set up as they are set in Windows.
Example 6 – Setting top level
exception handler in SEH
         004141EF SUB EDX,EDX
         004141F1 MOV EAX,DWORD PTR FS:[EDX]
         004141F4 MOV ESP,DWORD PTR DS:[EAX]
         004141F6 POP DWORD PTR FS:[EDX]
         004141F9 POP EAX
         004141FA POP EBP
         004141FB RETN
   Windows also registers another handler on top before
    application handler gets control
   Malware had already configured return a address on the stack
    that gets executed after RETN at 004141FB
   At 00414F4 it skips over the top level SEH handler and positions
    ESP to the SEH frame for this handler
   At 004141F6 the two top SEH handlers are torn down and after
    00414FB execution resumes at location specified by ESP, that
    was last updated at 00414DFA
Example 7 – Check for
BeingDebugged field in PEB
        3142821B MOV EAX, DWORD PTR FS:[18]
        31428220 MOV EAX, DWORD PTR DS:[EAX+30]
        31428223 MOVZX EAX, BYTE PTR DS:[EAX+2]
        31428227 CMP EAX, 0
        3142822A JNZ SHORT 3142826E
        3142822C CALL 31428231
        31428231 POP EBP
   At 3142821B address of TIB is obtained which is
    used to get address of PEB at 31428220
   At 31428223 BeingDebugged field of PEB is checked
    to evaluate the condition of the branch instruction at
Example 8 – Check for loader
lists in PEB
     0044D0A5 MOV EAX,DWORD PTR FS:[30]
     0044D0AB TEST EAX,EAX
     0044D0AD JS SHORT 0044D0BB

   At 0044DA05 PEB is referenced at 0044DA05.
   At 0044D0AF, 0044D0B2 and 0044D0B5, PEB_LDR_DATA,
    InInitializationOrderModuleList and
    InInitializationOrderModuleList.Flink respectively are referenced
   The malware happens to be referencing the kernel32.dll load
    information in its dependent module list sorted on initialization
Example 9 – Reference to
Thread Local Storage
       004033FA MOV EAX,DWORD PTR DS:[4503D4]
       00403400 TEST CL,CL
       00403402 JNZ SHORT 0040341A
       00403404 MOV EDX,DWORD PTR FS:[2C]
       0040340B MOV EAX,DWORD PTR DS:[EDX+EAX*4]
       0040340E RETN
   At 00403404 the beginning of thread local
    storage pointer array value in FS:[2Ch] is
    copied over in EDX
   The next instruction at 0040340B returns in
    EAX value of a TLS pointer as indexed by
    previous value in EAX
Example 10 – Normalizing
malformed PEs in loader
   All Win32 PE executables are expected to
    follow the PE format specifications in the
    strictest sense
   Yet, it is seen that many malware samples do
    not conform to these formal guidelines and
    are still allowed to be run by the Windows
   In general a malware should be loaded by the
    emulator as long as Windows loader accepts
    it by relaxing constraints on these kind of
Example 10 – Normalizing
malformed PEs in loader…contd
Structure     Field               Value

Dos Header    e_lfanew            0x10
Optional      SizeOfCode          0x4c454e52
Optional      SizeOfInitializedData 0x442e3233
Optional      AddressOfEntryPoint 0x11a4
Section Header PointerToRawData   0x10
Emulator Performance
    In plain emulation instructions are
    executed in software
   Plain emulation is hundreds of times
    slower than native execution
   Is not well suited for malware that
    require emulation for hundreds of
    millions of instructions
Emulator Performance Optimizations
– Dynamic Binary Translation (DBT)
   Frequently executed instructions, e.g.,
    decryption loop, are translated into native
   Repeat execution of same set of instructions
    above a certain threshold causes their
    translated counterpart to be executed
   DBT is only about ten times slower than
    native execution
Some more DBT details
   Code is partitioned into a sequence of
    Basics Blocks (BB)
   Each BB is self contained and does not
    contain any branch instructions
   For each BB corresponding translation
    of native instruction obtained
   There is a performance hit at the time
    of translation but that’s one time
Page fault handler based
   All memory writes from a packed program are
    monitored from kernel until an execute is issued in
    the modified monitored memory regions
   The page fault handler based unpacking system
    yields maximum speed improvements as the malware
    is allowed to run natively on the host machine and in
    that sense does not require any kind of emulation.
   But, its implementation is discouraged as it requires
    un-conventional ways of modification of page fault
    interrupt handler in the kernel and may not even
    work on 64-bit Vista because of patch guard
Thank You

Shared By:
Jun Wang Jun Wang Dr
About Some of Those documents come from internet for research purpose,if you have the copyrights of one of them,tell me by mail you!