Docstoc

A Museum of API Obfuscation on Win32 A Museum of API Obfuscation on - PowerPoint

Document Sample
A Museum of API Obfuscation on Win32 A Museum of API Obfuscation on - PowerPoint Powered By Docstoc
					A Museum of API Obfuscation on Win32

Masaki Suenaga, Senior Software Engineer
Symantec Security Response
November 2009
Packed executable files

• Each AV vendor has its own unpacker program
  – Unpacking code and data is the main purpose
  – APIs (imports) may be resolved
  – Unpacked programs can be disassembled without problems
• If unpacking fails, take a process memory dump
  – Can be used for determining if viral or not
  – Not suitable for disassembling
  – We need adjust PE (Portable Executable) header
  – We need resolve APIs (imports)
  – We need a tool to facilitate these tasks




                                                             2
Contents

     1     File Image vs. Memory Image

     2     API Analysis

     3     Generating Memory Dumps

     4     Runtime API Address Resolution

     5     Basic API Obfuscation

     6     Advanced API Obfuscation

                                            3
File image vs. memory image

            File image        Memory image
                                             Re-locatable in
           PE header           PE header     2-GB address
                                                 space
            CODE
                                 CODE
         Constant DATA
                             Constant DATA
        Resource DATA
                                                sections
                             Variable DATA



                             Resource DATA
* PE = Portable Executable                                     NEXT

                                                                 4
The Role of the Loader

  EXE File                  On Memory
                                        2. Allocate memory.

                                        3. Copy each section.
                                        Alignments are different.



1. CreateProcess(“A.exe”)
EXE file is read.
                                        4. Load imported DLLs.
                              DLL

                                        5. Resolve APIs (Import Addresses).
                              DLL



                                                                              NEXT

                                                                                5
API Address Resolution in the Loader

   Example : TranslateMsg(&msg);
                                                              IAT
   LEA EAX, [EBP-20h]
   PUSH EAX
   CALL [01001270h]


   MOV EDI, [01001270h]                        Import Directory
   LEA EAX, [EBP-20h]
   PUSH EAX                                        Table
   CALL EDI

                               By parsing Import Directory Table,
                                DLL names and API names can be obtained.
                               IDA (Interactive Disassembler),    , shows
                               API names as long as these tables are intact.
* IAT = Import Address Table                                                   NEXT

                                                                                 6
Development Environments and API
Calls
• Microsoft Visual C/C++
  – API call is compiled to CALL dword ptr [IAT entry]
  – C++ Runtime uses naming tricks to express return value and
    parameters
     • void * __cdecl operator new (unsigned int)  ??2@YAPAXI@Z
     • IDA will show “operator new”
  – MFC (Microsoft Foundation Class) does not export by names
     • void CWnd::~CWnd()  ??1CWnd@@UAE@XZ
     • MFC42.DLL exports as 818th entry. Import Directory Table only has “the 818th of
       MFC42.DLL”
     • But IDA has the list of entry numbers and API names, enabling to show API names
• Borland C/C++ and Delphi
  – API call is compiled to CALL near ptr j_API_xxx, and at j_API_xxx: jmp
    dword ptr [IAT entry]
     • IDA shows j_API_xxx as an API name, and the IAT entry as __imp_API_name

                                                                                         NEXT

                                                                                           7
Development Environments and API
Calls
• Microsoft Visual Basic
  – Call to VB runtime library is compiled to CALL dword ptr [IAT entry]
  – Call to Windows API (any DLL function) has a special form
  00408094: db 'urlmon',0                  ; DLL name
                                                              Private Declare Function
  004080A0: db 'URLDownloadToFileA',0      ; API name         URLDownloadToFile Lib "urlmon" _
                                                              Alias "URLDownloadToFileA" _
  004080B4:   dd 408094 ; offset of 'urlmon'                  (ByVal pCaller As Long, _
  004080B8:   dd 4080A0 ; offset of 'URLDownloadToFileA'      ByVal szURL As String, _
  004080BC:   dd 040000                                       ByVal szFileName As String, _
  004080C0:   dd 4092D8 ; offset of a structure
                                                              ByVal dwReserved As Long, _
  004080CC: URLDownloadToFileA proc near
                                                              ByVal lpfnCB As Long) As Long
            mov eax, dword_4092E0 ; initially zero. API address from the next call.
            or eax, eax
            jz short 4080D7 ; If not resolved yet, call VB Runtime library.
            jmp eax
  004080D7: push 4080B4
            mov eax, offset DllFunctionCall ; it is JMP [__imp_DllFunctionCall]
            jmp eax
            URLDownloadToFileA endp
                                                                                                 NEXT

                                                                                                   8
Contents

     1     File Image vs. Memory Image

     2     API Analysis

     3     Generating Memory Dumps

     4     Runtime API Address Resolution

     5     Basic API Obfuscation

     6     Advanced API Obfuscation

                                            9
API Analysis

• Knowledge of API is not required
  – If a file is a variant of well-known virus
  – If a file has some obvious strings used for malicious purposes
• Knowledge of API is required
  – If the strings are used for both benign and malign purposes
     • Black-box testing will not reveal the key-stroke stealing activities. Even if a file has a
       string used for an online game, it is not 100% sure the file is viral, without finding the
       code to hook the key strokes. We can guess it from the form of parameters, but the
       knowledge of API called assures that.
  – When a precise analysis is necessary to explain and/or remediate
     • Precise information can help users avoid the potential subsequent damage.
     • Without knowing precise behaviors, a remediation tool can miss something.




                                                                                                NEXT

                                                                                                    10
API Analysis vs. API Monitoring

• Many products and tools can monitor APIs called
  – They hook some points of API flow and log what APIs have been
    called
  – If a suspicious combination of API is caught, the running program can
    be marked as malicious
  – But monitor programs cannot know what APIs will be called
• We want to know APIs from memory dump of a running
  process
  – In many cases, APIs have been resolved before starting the main
    program
  – We can do white-box analysis using the memory dump taken during a
    black-box analysis
  – We may be able to know what the program will do, even if some
    routines have not run in the running environment.

                                                                            NEXT

                                                                             11
Avoiding Guesswork

• We can guess an API from its parameter
  – 0x80000001 may be HKEY_CURRENT_USER, and the API may be
    related to Windows registry access.
  – “Software\Microsoft\Windows\CurrentVersion” suggests the API may
    be RegOpenKey
• Some APIs are hard to guess by the parameters
  – For example, GetLocalTime(&systime) just takes a pointer as
    parameter
  – If we guess the APIs, we will write an ambiguous report
• Memory dump has the hints to know the APIs
  – If the loader has resolved an API, an IAT entry has the address of the
    API. For example, if we find 0x77D16017 in IAT, it means
    GetSystemMetrics on a certain version of Windows XP.
• However, some programs obfuscate the APIs
                                                                             NEXT

                                                                              12
Motivation for API Obfuscation

• If obfuscated, it takes more time to analyze.
  – If strings are also encrypted, it takes more and more time.
• Even if a proprietary packer is used, memory dump can be a
  great hint for analysis. They want to obfuscate APIs called
  even in memory dumps for some benign reasons.
  – Online game client software does not want you to hack the program
    and play an unfair game.
  – Unfamiliar encryption and decryption algorithm wants to hide the APIs
    to make it look difficult to analyze.
  – Checking algorithm of product keys should be kept secret.
  – Producer wants to protect its unpatented technologies or trade secrets
    from technology theft.
• Sometimes redundant code is also inserted to distract.
• Commercial packers are often used for program obfuscation.
                                                                             NEXT

                                                                              13
Motivation for Overcoming API
Obfuscation
• Our job is not only to detect viruses, but also to explain the
  behavior to customers.
  – If we can provide precise information in a timely manner, our
    customers will be protected from a further damages.
     • By closing ports, rejecting accesses at proxy server, setting registry entries
       beforehand, changing system time or regional settings and so on.

• We have to create removal (remediation) tools.
  – Without precise knowledge of behaviors, the tools may miss
    something important.

  From now, assume we are developing a de-
  obfuscation tool that generates an IDC script.
  An IDC script can change label names, add
  comments and so forth on IDA.
                                                                                        NEXT

                                                                                         14
Contents

     1     File Image vs. Memory Image

     2     API Analysis

     3     Generating Memory Dumps

     4     Runtime API Address Resolution

     5     Basic API Obfuscation

     6     Advanced API Obfuscation

                                            15
Resolving API Function Name w/o
Import Table
                            MSVCRT.DLL
                        77BE0000 – 77C32FFF

    CALL [01001480h]         GDI32.DLL
                        77C40000 – 77C7FFFF
                           USER32.DLL
  01001480h:           77D10000 – 77D9AFFF
      77D16017         77D16017

                                  Parse Export Table and get the
                                  API name GetSystemMetrics.

                           AVDAPI32.DLL
                        77DA0000 – 77E3AFFF

                           KERNEL32.DLL
                        77E40000 – 77F4DFFF
                                                                   NEXT

                                                                    16
Trusting the Import Table

• Import Table in memory dump can not necessarily be trusted.
  – Import Table may have been deliberately removed.
  – Import Table may have been replaced with a fake table.
     • IDA will show you fake API names.

• We should erase the Import Table of memory dump.




                                                                NEXT

                                                                 17
Adjusting Image Base

• If relocation occurs, some operands of CPU instruction
  changes on memory. But Image Base in PE header is left
  unchanged.

                                      Both strings of „test1‟ are found at 2000h
  MOV EAX, offset 102000h             from the base address. But the operand
  …                                   of “MOV EAX” has been changed from
  102000: db „test1‟                  102000h to 132000h by the loader.
  Image Base is 100000h               IDA deems the string exists at 32000h
                          Relocated
                                      from the base address, because Image
  MOV EAX, offset 132000h             Base value in PE header is still 100000h.
  …                                   If Image Base value is adjusted to
  132000: db „test1‟                  130000h, IDA shows the correct string.
  Image Base is 130000h


                                                                                   NEXT

                                                                                    18
Adjusting Section Tables

         File Alignment = 200h, Memory Alignment = 1000h
         Section[0]
                                           0                       0
 Pointer To Raw Data = 200h    1000h
                                          200
   Virtual Address = 1000h                      Wrong Section[0]
                                          600
                                                Wrong Section[1]

         Section[1]                                                1000

 Pointer To Raw Data = 600h    2000h
                                                 Section[0]
   Virtual Address = 2000h

                                                                   2000
 IDA thinks Section[0] is found at 200h
 and Section[1] at 600h.                         Section[1]

 We should adjust Pointer To Raw Data
 to be identical to Virtual Address.

                                                                          NEXT

                                                                           19
Recreating a Missing Header

• Once an EXE has started, its PE header is no longer
  necessary on memory.
• There are in fact some programs that erase PE header on
  memory.
  – IDA cannot disassemble if PE header is missing.
• Some programs destroys Section Tables on memory.
  – IDA may not show correct disassembly.
• For both cases, we should recreate a PE header along with a
  single flat Section Table.




                                                                NEXT

                                                                 20
Searching Hidden Modules

• An EXE is loaded first. Some DLLs are loaded next. A DLL
  may load other DLLs, if not loaded yet.
• All the EXE and DLLs (modules) loaded by the OS is
  managed. They can be enumerated by
  EnumProcessModules API. The details of each module can
  be retrieved by GetModuleInformation API.
• Traditional packers restore unpacked program in the same
  module, that can be enumerated. But some packers restore
  unpacked program in an allocated memory that is not
  enumerated by EnumProcessModules. We should search all
  memory blocks for hidden modules.


                                                             NEXT

                                                              21
API Calls Made by Injected Threads

• A program (process) cannot access the memory belonging to another
  process. But some APIs allow a process to read, write or allocate
  memory in another process.
• VirtualAllocEx() is used to allocate memory in a different process.
   – By copying code to the allocated memory and calling CreateRemoteThread(),
     a thread can be injected.
   – If a malicious thread is injected into Internet Explorer, the malicious code can
     act as a part of Internet Explorer, enabling to bypass firewall.
• Unless the injected thread is in a DLL, APIs should be resolved by the
  thread or its caller.
   – Some threads receives API list as a parameter (assuming the system DLLs
     are loaded at the same addresses with those of the caller‟s process), others
     resolve API and store API addresses in its stack or heap. It is not easy to
     create a generic tool to resolve APIs used by injected threads, especially for
     the former case, because it is not easy for a tool to know what is passed as a
     parameter to the injected thread by investigating memory dumps.
                                                                                        NEXT

                                                                                         22
Other Memory Blocks

      (vacant)
                                                Allocated memory block can
    Stack for main thread                       contain an injected thread or
                                                 “stage” program / function
                            Allocated Memory      table for API obfuscation.

      (vacant)


    EXE                              We primarily need the
                                    memory dumps of EXE or
    DLL                                 DLL, or both.
    Allocated Memory
                                          Allocated memory block can
     (vacant)                              contain a hidden module.


                                                   Imported DLLs are
                            System DLLs          necessary to obtain API
                                                    function names.
                                                                                NEXT

                                                                                 23
Contents

     1     File Image vs. Memory Image

     2     API Analysis

     3     Generating Memory Dumps

     4     Runtime API Address Resolution

     5     Basic API Obfuscation

     6     Advanced API Obfuscation

                                            24
Classification of API Obfuscation



                   Runtime API Address Resolution


 API Obfuscation         API address is resolved after
                        the main program has started.



                   Initialization-time API Obfuscation


                            API address is resolved
                            before the main program
                                     starts.

                                                         NEXT

                                                          25
Decoding API Names by Hashing

      0000016F GetAPIaddress proc near
1/2   0000016F arg_0 = dword ptr 14h ; DWORD checksum value
      0000016F arg_4 = dword ptr 18h ; virtual address of module (DLL)
      0000016F
      0000016F            push ebx
      00000170            push ebp
      00000171            push esi
      00000172            push edi
      00000173 mov ebp, [esp+arg_4] ; module handle (== VA of DLL image base)
      00000177            mov eax, [ebp+3Ch] ; position of PE header
      0000017A            mov edx, [ebp+eax+78h] ; Export Directory Table
      0000017E            add edx, ebp ; convert RVA to VA
      00000180            mov ecx, [edx+18h] ; number of Name Pointers
      00000183            mov ebx, [edx+20h] ; Name Pointer RVA
      00000186            add ebx, ebp ; convert RVA to VA
      00000188
      00000188 LOOP_NEXT_API:
      00000188            jecxz short NOT_FOUND
      0000018A            dec ecx
      0000018B            mov esi, [ebx+ecx*4] ; Export RVA
      0000018E            add esi, ebp ; convert RVA to VA
      00000190            xor edi, edi ; clear the checksum
      00000192            cld                                                   Trojan.Anicmoo
                                                                                             NEXT

                                                                                                 26
Decoding API Names by Hashing
      00000193 LOOP_NEXT_CHARACTER:
2/2   00000193
      00000195
                 xor eax, eax
                 lodsb ; al <-- [esi] , then esi++
                                                                  Hashing algorithm can change.
      00000196   cmp al, ah ; is it zero (null-terminator)?       For example:
      00000198   jz short END_OF_API_NAME                          rol edi, 7
      0000019A ror edi, 13                                         xor edi, eax
      0000019D add edi, eax
      0000019F   jmp short LOOP_NEXT_CHARACTER
      000001A1 END_OF_API_NAME:
      000001A1 cmp edi, [esp+arg_0] ; compare with the parameter checksum
      000001A5 jnz short LOOP_NEXT_API
      000001A7 mov ebx, [edx+24h] ; Ordinal Table RVA
      000001AA add ebx, ebp ; convert RVA to VA
      000001AC mov cx, [ebx+ecx*2] ; get the ordinal number
      000001B0 mov ebx, [edx+1Ch] ; Export Address Table RVA
      000001B3 add ebx, ebp ; convert RVA to VA
      000001B5 mov eax, [ebx+ecx*4] ; get RVA of the API via the ordinal number
      000001B8 add eax, ebp ; convert RVA to VA
      000001BA jmp RETURN
      000001BF NOT_FOUND:
      000001BF xor eax, eax
      000001C1 RETURN:
      000001C1 mov edx, ebp                   push eax ; HMODULE (== virtual address) of urlmon.dll
      000001C3 pop edi
      000001C4 pop esi
                                              push 702F1A36h ; checksum of URLDownloadToFileA
      000001C5 pop ebp                        call GetAPIaddress
      000001C6 pop ebx
      000001C7 retn                                                                       Trojan.Anicmoo
                                                                                                        NEXT
      000001C7 GetAPIaddress endp
                                                                                                         27
Decoding API Names by Hashing

• Backdoor.Darkmoon uses more complicated hashing
  algorithm to resolve API addresses.
• The resolved API addresses are often stored in local (stack)
  variables.
  – It makes it difficult to resolve API names by inspecting memory dumps,
    because most stack variables are usually lost in a memory dump.
• Otherwise they are stored in a heap (allocated) memory.
  – APIs are referenced by a structure such as [esi+24h]. It is difficult for
    a generic tool to know where the register esi points to, making it hard
    to resolve API names by inspecting memory dumps.




                                                                                NEXT

                                                                                 28
The Use of LoadLibrary() and
GetProcAddress()
• What APIs can be called is easily known by inspecting the
  Import Table.
  – If bind, listen, send, recv, RegSetValue, CreateRemoreThread and/or
    SetWindowsHook is imported, the program will attract virus analysts.
• If LoadLibrary() + GetProcAddress() is used instead of static
  link, Import Table would not yield the suspicious imports.
  – W32.Spybot.Worm often does it, while the parameter strings to
    GetProcAddress() such as “bind”, “listen” and so on are visible.
• If both most strings and the strings of API names are
  encrypted, it takes more time to analyze than if not encrypted.
  – W32.Stration, prevalent in 2006 through 2007, does that.



                                                                           NEXT

                                                                            29
The Use of LoadLibrary() and
GetProcAddress()
      00401EE0   sub_401EE0 proc near
      00401EE0   var_18 = dword ptr -18h
1/2   00401EE0   var_14 = dword ptr -14h
      00401EE0   var_10 = dword ptr -10h
      00401EE0   var_C       = dword ptr -0Ch
      00401EE0   var_8       = dword ptr -8
      00401EE0   var_4       = byte ptr -4
      00401EE0   arg_0       = dword ptr 4
      00401EE0   arg_4       = dword ptr 8
      00401EE0
      00401EE0   mov eax, dword_404118 ; saved API address
      00401EE5   or byte ptr word_40401C, 3Dh
      00401EEC   sub esp, 18h
      00401EEF   test eax, eax
      00401EF1   jnz short loc_401F48
      00401EF3   mov eax, ds:dword_4010C0 ; 637E7640h
      00401EF8   mov ecx, ds:dword_4010C4 ; 44657851h
      00401EFE   mov edx, ds:dword_4010C8 ; 7B70797Eh
      00401F04   mov [esp+18h+var_18], eax
      00401F07   mov eax, ds:dword_4010CC ; 7D755872h
      00401F0C   mov [esp+18h+var_14], ecx
      00401F10   mov ecx, ds:dword_4010D0 ; 17637472h
      00401F16   mov [esp+18h+var_10], edx
      00401F1A   mov dl, ds:byte_4010D4 ; 0
      00401F20   mov [esp+18h+var_C], eax
      00401F24   mov [esp+18h+var_8], ecx
      00401F28   mov [esp+18h+var_4], dl                     W32.Stration.CX@mm
      00401F2C   xor eax, eax
                                                                             NEXT
      00401F2E   mov edi, edi
                                                                              30
The Use of LoadLibrary() and
GetProcAddress()
      00401F30 loc_401F30:
      00401F30 xor byte ptr [esp+eax+18h+var_18], 17h ; decrypting
2/2   00401F34 inc eax
      00401F35 cmp eax, 14h
      00401F38 jl short loc_401F30
      00401F3A lea eax, [esp+18h+var_18]
      00401F3D push eax                             ; 'WaitForSingleObject'
      00401F3E call sub_401E40                      ; get the API address
      00401F43 mov dword_404118, eax ; save the API address for the next time
      00401F48
      00401F48 loc_401F48:
      00401F48 mov ecx, [esp+18h+arg_4]
      00401F4C mov edx, [esp+18h+arg_0]
      00401F50 push ecx
      00401F51 push edx
      00401F52 call eax ; call the API
      00401F54 add esp, 18h
      00401F57 retn 8
      00401F57 sub_401EE0 endp



It is difficult for a tool to resolve API names by memory dumps,
because not all the routines have been executed by the time
when memory dumps are taken.
                                                                                W32.Stration.CX@mm
                                                                                                NEXT

                                                                                                 31
Contents

     1     File Image vs. Memory Image

     2     API Analysis

     3     Generating Memory Dumps

     4     Runtime API Address Resolution

     5     Basic API Obfuscation

     6     Advanced API Obfuscation

                                            32
Basic API Obfuscation

• Most Windows viruses are written in high-level languages,
  such as C, C++, Visual Basic and Delphi.
• Runtime API Obfuscation requires some code modification.
  Most virus authors choose to use special libraries or packer
  software to obfuscate APIs.
  – If a special library is linked instead of a regular import library, just a call
    to an API will be redirect to obfuscating code, sometimes with
    redundant junk code.
  – If a packer with API obfuscation functionality is used against a
    compiled EXE/DLL program, a virus author has nothing to think about
    the API obfuscation, but just uses the packer.
  – Usually the APIs are resolved before the program proper starts, but
    after unpacking processes have completed. The obfuscating code, if
    any, is also generated at the time of API resolution.
                                                                                  NEXT

                                                                                      33
Staged API Obfuscation

   call ds:GetSystemTime
     or
   call j_GetSystemTime
   j_GetSystemTime: jmp ds:GetSystemTime                          Zero stage
                                                                (Usual API call)
     A de-obfuscation tool should generate an IDC script that
             renames this label to “GetSystemTime.”

   Existing just to redirect   sub_414258:
          the flow.            mov eax, ds:GetSystemTime
                               jmp eax                             One-stage
                                                                 API obfuscation

      call sub_414258
                                        There are also multi-stage
                                        API obfuscations.
                                                                                   NEXT

                                                                                    34
Staged API Obfuscation

                                                                  CreateFileA
                                370000: mov eax, 77E7B476h
                                370005: jmp eax
                                                               Extra-modular
                                                                 one-stage
                                                              API obfuscation
  4010C0: call ds:[402780h]
  ...                                     370000h resides
  402780:    dd 00370000h                    in a different
                                            memory block,
                                            which is out of
                                           the scope of the
                                          analyzing module.
  4010C0: call ds:[CreateFileA]
  ...
  402780: CreateFileA dd 00370000h


                     A de-obfuscation tool should
                          rename this label.
                                                                                NEXT

                                                                                 35
Extra-modular Function Table

                                                                  SHFileOperationA

                                       00B5A068 dd 7743DE3Ah




                                                  0B5A068h resides in a
00404A13 call dword ptr ds:0B5A068h               different memory block,
                                                      which is out of the
                                                  scope of the analyzing
                                                          module.



00404A13 call dword ptr ds:0B5A068h ; SHFileOperationA


            Because we cannot rename the label at 0B5A068h, which is out of the
               scope of module, we can only add a comment at 00404A13h.

                                                                                     NEXT

                                                                                      36
Immediate Jumps


 004023A8 call ds:label_4130B4
 ...
 004130B4 label_4130B4 dd 972030h                             A de-obfuscation tool
                                                             should rename the label
                                                                “label_4130B4” to
                                                                   “CreateFileA.”
                    Pointing to a different memory block.


 00972030 jmp near ptr 77E5B476h; CreateFileA


        This is not jmp [nnnnnnnn], but jmp nnnnnnnn.
    It is rare to see a direct (immediate) jump to another
        module, because the operand of jmp should be
      relative from the jmp operation itself, requiring an
                  additional step of calculation.

                                                                                       NEXT

                                                                                        37
Jump-in

• Regardless of API obfuscation, an API call should reach the
  entry point of the target API.
  – If the call reaches 77E5B476h and 77E5B476h is the entry point of
    CreateFileA in kernel32.dll, it is de-obfuscated.


     call xxxx                        …
                                      77E5B091 : CreateFileW
       xxxx: jmp ds:[yyyy]
                                      77E5A47F : CreateFileMappingW
                                      77E5A543 : CreateFileMappingA
                                      77B5B476 : CreateFileA
 yyyy: dd zzzz
                                      77E546E8 : CreateFiberEx
                                      77E546D0 : CreateFiber
                                      77E54D43 : CreateEventW
       zzzz: mov edx, 77E5B476h       77E54DE3 : CreateEventA
             push edx                 …
             ret

                                                                        NEXT

                                                                         38
Jump-in


   00401922 call sub_403C08
   ...
   00403C08 jmp ds:off_404090
   ...                                                       …
   00404090 off_404090 dd offset unk_40C4A1                  77E545BE : SwitchToFiber
   ...                                                       77E74D56 : SuspendThread
   0040C4A1 unk_40C54A1: jmp near ptr 0040C4A4h              77E5A325 : SleepEx
   0040C4A3 db 0EAh ; dummy byte to distract                 77E41BEA : Sleep
   0040C4A4 push 0                                           77E4D2CF : SizeOfResource
   0040C4A6 jmp near ptr 77E41BECh                           …


                      This is the address of                 Sleep proc near
                            Sleep + 2                         push 0
                                                              push dword ptr [esp+8]
                                                              call SleepEx
                                    “push 0” takes 2          retn 4
These are dummy (junk) code.
                                         bytes.              Sleep endp

                         This method can also bypass API hooking and break point, because the
                                                                                                NEXT
                                first instruction within the API routine is never executed.
                                                                                                 39
Contents

     1     File Image vs. Memory Image

     2     API Analysis

     3     Generating Memory Dumps

     4     Runtime API Address Resolution

     5     Basic API Obfuscation

     6     Advanced API Obfuscation

                                            40
Logic Stage and Skipper Stage

 00D50000          sbb ecx,61h       ; meaningless instruction
 00D50003          jmp short 00D50006h
 00D50005          db 0E9h         ; placed to obfuscate in disassembler
 00D50006          mov ecx, 486366h ; meaningless instruction
 00D5000B          pop eax        ; return address
 00D5000C          lea eax,[eax+1] ; return address += 1
 00D5000F          push eax        ; return address is now incremented (skipper)
 00D50010          push 0D40000h ; the address of the next stage
 00D50015          retn        ; jump to the next stage



                   Skipper stage performs the increment
                             of return address.


       Logic stage has some instructions other than calls and jumps.
        But there are some patterns and emulation is not necessary.

                                                                                   NEXT

                                                                                    41
 Logic Stage and Skipper Stage

   The program proper calls the obfuscated API:

   00411000              push eax
   00411001              call ds:[420008h]       ; points to logic and skipper stage
   00411007              db 0E9h                 ; a skipped byte
   00411008              or eax, eax             ; instruction pointer returns here from the call


                                              IDA is confused by 0E9h, which is jmp
                                         instruction, and shows “jmp xxxxxxxx,” if a de-
                                             obfucation tool does not skip this byte.


                                                                Rename the label
            Sample IDC script to deal with this skipper stage

            MakeName(0x420008,”RegOpenKeyEx”);
            MakeUnknown(0x411007,1,0);                                Undefine the skipped byte
            MakeCode(0x411008);
                                                                      Re-analyze code
                                                                                                    NEXT
Observed in Backdoor.Graybird
                                                                                                     42
Copied and Substituted Obfuscation

  01230000      mov eax,fs:[18h]                 Copied API Obfuscation
  01230006      mov ecx,[eax+30h]
  01230009      mov eax, word ptr [ecx+0B0h]
  0123000F      movzx edx, word ptr [ecx+0ACh]
  01230016      xor eax,0FFFFFFFEh                 This is an entire copy
  01230019      shl eax,0Eh                        of GetVersion API in
  0123001C      or eax,edx                             kernel32.dll.
  0123001E      shl eax,8
  01230021      or eax,[ecx+0A8h]
  01230027      shl eax,8
  0123002A      or eax,[ecx+0A4h]
  01230030      ret

• Since it does not refer to any OS modules, it is impossible to
  know what API is called from the address it reached, or
  indeed whether it is an API call at all.
• We have to search major DLLs for the API function that
  matches the copied API.
                                                                            NEXT

                                                                             43
Copied and Substituted Obfuscation

• A Copied API may change its binary machine code when the
  relative distance of jump or call is changed.
   – Short jump (-128 to +127) vs. near jump (-32768 to +32767)
   – Operand of near call is relative from the caller address
  00402000 6A 00            push 0
  00402002 FF 74 24 08      push [esp+8]                         A copy of Sleep API
  00402006 E8 1A 83 A5 77   call SleepEx @ 77E5A325               from kernel32.dll.
  0040200B C2 04 00         ret 4


  77E41BEA 6A 00            push 0                              Genuine Sleep API in
  77E41BEC FF 74 24 08      push [esp+8]                           kernel32.dll.
  77E41BF0 E8 30 87 01 00   call SleepEx @ 77E5A325
  77E41BF5 C2 04 00         ret 4

      We have to compare the functions not by machine code but by logical meaning.
    Because the comparison is time-consuming, we should optimize the performance by
     ordering the major DLLs. Ex. Kernel32.dll > advapi32.dll > user32.dll > shell32.dll
                                                                                           NEXT

                                                                                            44
Copied and Substituted Obfuscation

 00F40000        mov eax,fs:[18h]                Substituted API Obfuscation
 00F40006        mov eax,[eax+34h]
 00F40009        ret


• This function is found in ntdll.dll as RtlGetLastWin32Error.
• Most engineers are more familiar with GetLastError in
  kernel32.dll than with RtlGetLastWin32Error.
   – In fact, GetLastError in kernel32.dll is just redirected to
     RtlGetLastWin32Error in ntdll.dll.
• A de-obfuscation tool can have a list of well-known and
  possibly substituted APIs and present us more commonly
  used API names.

                                                                           NEXT

                                                                            45
Push-ret and Push-calc-ret
Obfuscation

 push 71A23ECEh   ; bind()                    Push-ret Obfuscation
 ret




                                             Push-calc-ret Obfuscation
 003C80B0         call dword ptr [3E82B8h] ; calls 17B000Dh
 ...
 003E83B8         dd 17B000Dh ; in another memory block
 ...
 017B000D         push 3E62B8CDh
 017B0012         sub dword ptr [esp], 0CCC079FFh ; = 71A23ECEh (bind())
 017B0019         ret



                       A de-obfuscation tool should detect this pattern and calculate
                                        the value in the stack top.

                                                                                        NEXT

                                                                                         46
Push-ret and Push-calc-ret
Obfuscation
                                         Enhanced Push-ret Obfuscation
 004014DA   mov esi, offset unk_404907   ; stores DWORD-value list
 004014DF   push dword ptr [esi+30h]     ; pushes 8DC82618h
 004014E2   push loc_4014ED              ; return address
 004014E7   push loc_4010A4              ; call destination
 004014EC   ret                          ; calls 4010A4h
 004014ED   <next instruction>           ; returns here
 ...
 004010A4   mov edx, [esp+4]             ; 8DC82618h <-- came from [esi+30h]
 004010A8   mov ecx, [esp+0]             ; 004014EDh (return address)
 004010AB   add esp, 8
 004010AE   ror edx, 0FAh
 004010B1   sub edx, dword_404027        ; == 0FA23D1ADh
 004010B7   push ecx                     ; returning address
 004010B8   push edx                     ; API address of CreateFileA
 004010B9   ret                          ; jumps to CreateFileA


                       There is no call instruction.
                    It does not look like an API call.
                                                                               47
Push-ret and Push-calc-ret
Obfuscation

 mov esi, xxxx                              1. Detect this pattern.
 push dword ptr [esi+xx]
 push xxxx
 push xxxx
 ret
                                        2. Read the value at [esi+xx] and
                                               get the API name.

 004014DA            mov esi, offset unk_404907
 004014DF            push dword ptr [esi+30h]     ; calls CreateFileA
 004014E2            push loc_4014ED              ; returns to 4014EDh
 004014E7            push loc_4010A4
 004014EC            ret
 004014ED            <next instruction>




                                            3. Add the comments on IDA.
                                                                            NEXT

                                                                             48
Padded and Copied API Obfuscation
(Themida)

 00401B77           call 2930000h
 ...
 02930000           push edx   ; making room for EBP
 02930001           push eax   ; save EAX
 02930002           push edx   ; save EDX
 02930003           jmp 293000Eh
 ...
 0293000E           rdtsc     ; destroys EDX:EAX
 02930010           jmp 2930029h
 ...
 02930029           pop edx     ; restore EDX
 0293002A           pop eax     ; restore EAX
 0293002B           mov [esp],ebp


     Initial code is copied from the API function, but replaced
      with longer instructions and interleaved with redundant
                              instructions.
                      But it has some patterns.


                                                                  49
Padded and Copied API Obfuscation
(Themida)

 00401B77         call 2930000h
 ...
 02930000         push edx   ; making room for EBP
 02930001         push eax   ; save EAX
 02930002         push edx   ; save EDX
 02930003         jmp 293000Eh
 ...
 0293000E         rdtsc     ; destroys EDX:EAX
 02930010         jmp 2930029h
 ...
 02930029         pop edx     ; restore EDX
 0293002A         pop eax     ; restore EAX
 0293002B         mov [esp],ebp


       1. Remove the redundant instructions,
         according to an internal redundant
                  code block list.



                                                     50
Padded and Copied API Obfuscation
(Themida)

 00401B77   call 2930000h
 ...
 02930000   push edx     ; making room for EBP
 0293002B   mov [esp],ebp


               2. Replace the instructions according to   push reg32(1)
                    an internal conversion table.         mov [esp], reg32(2)
                                                           push reg32(2)

 00401B77   call 2930000h
 ...
 02930000   push ebp


                  3. Follow the steps necessary for
                         Jump-in obfuscation.



                                                                                NEXT

                                                                                 51
Padded and Copied API Obfuscation
(Enigma)

 00401753   call dword ptr ds:973245h     ; it points to 974819h
 ...
 00974819   call 97481Fh          ;
 0097481E   push esi              ; dummy instruction
 0097481F   call 974827h          ;
 00974824   jmp 97482Ah
 00974826   db 15h               ; dummy code
 00974827   ret 4                ;
 0097482A   add esp, 5C9099Bh
 00974830   mov [esp-5C909Fh],esi ; mov [esp-4], esi
 00974837   add esp, 0FA36F661h ; sub esp, 4 (5C9099Bh + 0FA36F661h == -4)
 0097483D    call 974846h
 00974842   db 80h, 0DEh, 9Dh, 70h ; dummy code
 00974846 add esp,4


        It is similar to Themida, but the pattern is different.



                                                                             52
Padded and Copied API Obfuscation
(Enigma)

 00401753   call dword ptr ds:973245h   ; it points to 974819h
 ...
 00974819   call 97481Fh          ;
 0097481E   push esi              ; dummy instruction
 0097481F   call 974827h          ;
 00974824   jmp 97482Ah
 00974826   db 15h               ; dummy code
 00974827   ret 4                ;
 0097482A   add esp, 5C9099Bh
 00974830   mov [esp-5C909Fh],esi ; mov [esp-4], esi
 00974837   add esp, 0FA36F661h ; sub esp, 4 (5C9099Bh + 0FA36F661h == -4)
 0097483D    call 974846h
 00974842   db 80h, 0DEh, 9Dh, 70h ; dummy code
 00974846 add esp,4


        1. Remove the redundant instructions,
          according to an internal redundant
                   code block list.


                                                                             53
Padded and Copied API Obfuscation
(Enigma)

 00401753   call dword ptr ds:973245h   ; it points to 974819h
 ...
 00974819   ….
 0097482A   add esp, 5C9099Bh
 00974830   mov [esp-5C909Fh],esi ; mov [esp-4], esi
 00974837   add esp, 0FA36F661h ; sub esp, 4 (5C9099Bh + 0FA36F661h == -4)

                        2. Replace the instructions according to
                             an internal conversion table.

 00401753 call dword ptr ds:973245h                add esp, n
 ...                                               mov [esp-(n+4), reg32(1)
 00974819 push esi                                 add esp, (100000000h – (n+4))
                                                    push reg32(1)


                           3. Follow the steps necessary for
                                  Jump-in obfuscation.

                                                                                   NEXT

                                                                                    54
Splicing Intensive Instructions to
Provide Obfuscation (Obsidium)
• Sometimes emulation is required to obtain the target API address.
   – Example 1 : All API calls reach a common dispatcher routine, which receives
     EDX register as a parameter. EDX is calculated to an index which leads to a
     table of API addresses.
   – Example 2: All API calls reach a common dispatcher routine, which receives
     no parameter. Instead, it checks as to where it is called from by reading the
     stack. Then it refers to a table to get the target API address from the return
     address in the stack.

                                                             Return address in stack

   EDX         Dispatcher          API                          Dispatcher         API


                  Table         Example 1                          Table          Example 2


  Emulation is effective in dealing with these kinds of API obfuscation. But it is time-
  consuming and sometimes yields wrong answers. Emulation should be a last resort,
  employed only when the use of conventional analytical techniques proves impossible.
                                                                                              NEXT

                                                                                               55
Splicing Intensive Instructions to
Provide Obfuscation (Obsidium)
008B6037 55             push ebp
008B6038 8B EC          mov ebp, esp
008B603A 81 EC 30 01 00 00 sub esp, 130h                      Exception handler
008B6040 EB 04          jmp short 008B6046
008B6046 60             pusha                        Obsidium sets up a Structured
008B6047 EB 04          jmp short 008B604D
008B604D 9C             pushf                     Exception Handler (SEH) to continue,
008B604E EB 03          jmp short 008B6053            while impeding a debugger.
008B6053 EB 04          jmp short 008B6059
008B6059 E8 00 00 00 00 call $+5 (008B605E)
008B605E EB 01          jmp short 008B6061        008B6087 64 89 20       mov fs:[eax], esp
008B6061 5E             pop esi                   008B608A EB 01          jmp short 008B608D
008B6062 EB 03          jmp short 008B6067        008B608D EB 03          jmp short 008B6092
008B6067 EB 01          jmp short 008B606A        008B6092 EB 02          jmp short 008B6096
008B606A 8B 96 64 03 00 00 lea edx, [esi+364h]    008B6096 EB 36          jmp short 008B60CE
008B6070 EB 04          jmp short 008B6076        008B60CE EB 01          jmp short 008B60D1
008B6076 33 C0          xor eax, eax              008B60D1 8B 54 24 30    mov edx, [esp+30h]
008B6078 EB 03          jmp short 008B607D        008B60D5 EB 01          jmp short 008B60D8
008B607D 52             push edx                  008B60D8 EB C1          jmp short 008B609B
008B607E EB 01          jmp short 008B6081        008B609B EB 02          jmp short 008B609F
008B6081 64 FF 30       push dword ptr fs:[eax]   008B609F F7 C2 01 00 00 00 test edx, 1
008B6084 EB 01          jmp short 008B6087        008B60A5 EB 04          jmp short 008B60AB
                                                  008B60AB 74 0C          jz 008B60B9
                                                  008B60AD EB 04          jmp short 008B60B3
                                                  008B60B3 0F 0B          ud2             ; undefined opcode
     Many meaningless short jumps.                008B60B5 EB 02          jmp short 008B60B9
        Let‟s remove them all.                    008B60B9 EB 03          jmp short 008B60BE
                                                  008B60BE F7 F0          div eax         ; division by zero
                                                                                                               NEXT

                                                                                                                56
Splicing Intensive Instructions to
Provide Obfuscation (Obsidium)
008B6037 55                push ebp
008B6038 8B EC             mov ebp, esp
008B603A 81 EC 30 01 00 00 sub esp, 130h
008B6046 60                pusha      ; push EAX,ECX,EDX,EBX,ESP,BP,ESI,EDI
008B604D 9C                pushf           ; push EFLAGS
008B6059 E8 00 00 00 00    call $+5 (008B605E)
008B6061 5E                pop esi           ; esi = 008B6061h
008B606A 8B 96 64 03 00 00 lea edx, [esi+364h] ; edx = 008B63C5h
008B6076 33 C0             xor eax, eax
008B607D 52                push edx            ; exception handler address (008B63C5h)
008B6081 64 FF 30          push dword ptr fs:[eax]
008B6087 64 89 20          mov fs:[eax], esp
008B60D1 8B 54 24 30       mov edx, [esp+30h] ; value from uninitialized stack variable
008B609F F7 C2 01 00 00 00 test edx, 1
008B60AB 74 0C             jz 008B60BE
008B60B3 0F 0B             ud2              ; undefined opcode exception
008B60BE F7 F0             div eax          ; division by zero exception

                                   Exception occurs. The next instruction
                                       resumes from 008B63C5h.
                                                                                          NEXT

                                                                                           57
Splicing Intensive Instructions to
Provide Obfuscation (Obsidium)


      Exception handler

   008B63C5 EB 03            jmp short 008B63CA
   008B63CA E8 00 00 00 00   call $+5 (008B63CF)
   008B63CF EB 02            jmp short 008B63D3
   008B63D3 5A               pop edx
   008B63D4 EB 01            jmp short 008B63D7
   008B63D7 8B 8A 95 FB FF FF mov ecx, [edx-46Bh]
   008B63DD EB 04            jmp short 008B63E3



                                                    Exception handler is just
                                                    a continued code of API
                                                     resolution, not an error
                                                             handler.


                                                                                NEXT

                                                                                 58
Splicing Intensive Instructions to
Provide Obfuscation (Obsidium)
        Dummy call-ret

    …
    call 77E7xxxxh                                        Kernel32.dll
    …


 1. The obfuscator searches a system                   C3    ret
    DLL for a byte of 0C3h in advance.
 2. During an API call, it calls the
    address where 0C3h was found.
 3. But 0C3h is just a “ret” instruction.



          A de-obfuscation tool should ignore this dummy call to a system DLL.
            Otherwise, the tool would think it has reached an API in the DLL.


                                                                                 NEXT

                                                                                  59
Splicing Intensive Instructions to
Provide Obfuscation (Obsidium)

16-bit addressing mode

64 67 FF 36 00 00 push dword ptr fs:[0] ; 67 changes from 32 bit to 16 bit mode

64 67 89 26 00 00 mov fs:[0], esp




 It is unusual to see 16-bit addressing mode instructions in Win32 user-mode application.
                  But the emulator should support 16-bit addressing mode.




                                                                                            NEXT

                                                                                             60
Conclusion

   Uncovering obfuscated API calls is a difficult task given the wide
 range of obfuscation techniques that can be used and combined to
 hide a program‟s functionality.
   Emulation centered on the call instruction may initially seem to be
 an effective method of de-obfuscation but suffers from the
 disadvantage of being defeated by the copying of API code and may
 yield false positives when the emulated instruction pointer reaches an
 OS module. Emulation can also be time-consuming and as such
 may not be the best choice in situations in which results are required
 in a timely manner.
  It is therefore necessary to design a modular de-obfuscation tool
 able to deal with the myriad of techniques described in this
 presentation.



                                                                          61
                                                 Thank You!


Question? (in easy English)
質問? (日本語でどうぞ)
Вопрос? (не по-русски)

Masaki Suenaga
masaki_suenaga@symantec.com

Copyright © 2009 Symantec Corporation. All rights reserved. Symantec and the Symantec Logo are trademarks or
registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be
trademarks of their respective owners.
This document is provided for informational purposes only and is not intended as advertising. All warranties relating to
the information in this document, either express or implied, are disclaimed to the maximum extent allowed by law. The
information in this document is subject to change without notice.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:175
posted:4/5/2010
language:English
pages:62