A Museum of API Obfuscation on Win32 Masaki Suenaga, Senior Software Engineer Symantec Security Response November 2009 Packed executable files • Each AV vendor has its own unpacker program – Unpacking code and data is the main purpose – APIs (imports) may be resolved – Unpacked programs can be disassembled without problems • If unpacking fails, take a process memory dump – Can be used for determining if viral or not – Not suitable for disassembling – We need adjust PE (Portable Executable) header – We need resolve APIs (imports) – We need a tool to facilitate these tasks 2 Contents 1 File Image vs. Memory Image 2 API Analysis 3 Generating Memory Dumps 4 Runtime API Address Resolution 5 Basic API Obfuscation 6 Advanced API Obfuscation 3 File image vs. memory image File image Memory image Re-locatable in PE header PE header 2-GB address space CODE CODE Constant DATA Constant DATA Resource DATA sections Variable DATA Resource DATA * PE = Portable Executable NEXT 4 The Role of the Loader EXE File On Memory 2. Allocate memory. 3. Copy each section. Alignments are different. 1. CreateProcess(“A.exe”) EXE file is read. 4. Load imported DLLs. DLL 5. Resolve APIs (Import Addresses). DLL NEXT 5 API Address Resolution in the Loader Example : TranslateMsg(&msg); IAT LEA EAX, [EBP-20h] PUSH EAX CALL [01001270h] MOV EDI, [01001270h] Import Directory LEA EAX, [EBP-20h] PUSH EAX Table CALL EDI By parsing Import Directory Table, DLL names and API names can be obtained. IDA (Interactive Disassembler), , shows API names as long as these tables are intact. * IAT = Import Address Table NEXT 6 Development Environments and API Calls • Microsoft Visual C/C++ – API call is compiled to CALL dword ptr [IAT entry] – C++ Runtime uses naming tricks to express return value and parameters • void * __cdecl operator new (unsigned int) ??2@YAPAXI@Z • IDA will show “operator new” – MFC (Microsoft Foundation Class) does not export by names • void CWnd::~CWnd() ??1CWnd@@UAE@XZ • MFC42.DLL exports as 818th entry. Import Directory Table only has “the 818th of MFC42.DLL” • But IDA has the list of entry numbers and API names, enabling to show API names • Borland C/C++ and Delphi – API call is compiled to CALL near ptr j_API_xxx, and at j_API_xxx: jmp dword ptr [IAT entry] • IDA shows j_API_xxx as an API name, and the IAT entry as __imp_API_name NEXT 7 Development Environments and API Calls • Microsoft Visual Basic – Call to VB runtime library is compiled to CALL dword ptr [IAT entry] – Call to Windows API (any DLL function) has a special form 00408094: db 'urlmon',0 ; DLL name Private Declare Function 004080A0: db 'URLDownloadToFileA',0 ; API name URLDownloadToFile Lib "urlmon" _ Alias "URLDownloadToFileA" _ 004080B4: dd 408094 ; offset of 'urlmon' (ByVal pCaller As Long, _ 004080B8: dd 4080A0 ; offset of 'URLDownloadToFileA' ByVal szURL As String, _ 004080BC: dd 040000 ByVal szFileName As String, _ 004080C0: dd 4092D8 ; offset of a structure ByVal dwReserved As Long, _ 004080CC: URLDownloadToFileA proc near ByVal lpfnCB As Long) As Long mov eax, dword_4092E0 ; initially zero. API address from the next call. or eax, eax jz short 4080D7 ; If not resolved yet, call VB Runtime library. jmp eax 004080D7: push 4080B4 mov eax, offset DllFunctionCall ; it is JMP [__imp_DllFunctionCall] jmp eax URLDownloadToFileA endp NEXT 8 Contents 1 File Image vs. Memory Image 2 API Analysis 3 Generating Memory Dumps 4 Runtime API Address Resolution 5 Basic API Obfuscation 6 Advanced API Obfuscation 9 API Analysis • Knowledge of API is not required – If a file is a variant of well-known virus – If a file has some obvious strings used for malicious purposes • Knowledge of API is required – If the strings are used for both benign and malign purposes • Black-box testing will not reveal the key-stroke stealing activities. Even if a file has a string used for an online game, it is not 100% sure the file is viral, without finding the code to hook the key strokes. We can guess it from the form of parameters, but the knowledge of API called assures that. – When a precise analysis is necessary to explain and/or remediate • Precise information can help users avoid the potential subsequent damage. • Without knowing precise behaviors, a remediation tool can miss something. NEXT 10 API Analysis vs. API Monitoring • Many products and tools can monitor APIs called – They hook some points of API flow and log what APIs have been called – If a suspicious combination of API is caught, the running program can be marked as malicious – But monitor programs cannot know what APIs will be called • We want to know APIs from memory dump of a running process – In many cases, APIs have been resolved before starting the main program – We can do white-box analysis using the memory dump taken during a black-box analysis – We may be able to know what the program will do, even if some routines have not run in the running environment. NEXT 11 Avoiding Guesswork • We can guess an API from its parameter – 0x80000001 may be HKEY_CURRENT_USER, and the API may be related to Windows registry access. – “Software\Microsoft\Windows\CurrentVersion” suggests the API may be RegOpenKey • Some APIs are hard to guess by the parameters – For example, GetLocalTime(&systime) just takes a pointer as parameter – If we guess the APIs, we will write an ambiguous report • Memory dump has the hints to know the APIs – If the loader has resolved an API, an IAT entry has the address of the API. For example, if we find 0x77D16017 in IAT, it means GetSystemMetrics on a certain version of Windows XP. • However, some programs obfuscate the APIs NEXT 12 Motivation for API Obfuscation • If obfuscated, it takes more time to analyze. – If strings are also encrypted, it takes more and more time. • Even if a proprietary packer is used, memory dump can be a great hint for analysis. They want to obfuscate APIs called even in memory dumps for some benign reasons. – Online game client software does not want you to hack the program and play an unfair game. – Unfamiliar encryption and decryption algorithm wants to hide the APIs to make it look difficult to analyze. – Checking algorithm of product keys should be kept secret. – Producer wants to protect its unpatented technologies or trade secrets from technology theft. • Sometimes redundant code is also inserted to distract. • Commercial packers are often used for program obfuscation. NEXT 13 Motivation for Overcoming API Obfuscation • Our job is not only to detect viruses, but also to explain the behavior to customers. – If we can provide precise information in a timely manner, our customers will be protected from a further damages. • By closing ports, rejecting accesses at proxy server, setting registry entries beforehand, changing system time or regional settings and so on. • We have to create removal (remediation) tools. – Without precise knowledge of behaviors, the tools may miss something important. From now, assume we are developing a de- obfuscation tool that generates an IDC script. An IDC script can change label names, add comments and so forth on IDA. NEXT 14 Contents 1 File Image vs. Memory Image 2 API Analysis 3 Generating Memory Dumps 4 Runtime API Address Resolution 5 Basic API Obfuscation 6 Advanced API Obfuscation 15 Resolving API Function Name w/o Import Table MSVCRT.DLL 77BE0000 – 77C32FFF CALL [01001480h] GDI32.DLL 77C40000 – 77C7FFFF USER32.DLL 01001480h: 77D10000 – 77D9AFFF 77D16017 77D16017 Parse Export Table and get the API name GetSystemMetrics. AVDAPI32.DLL 77DA0000 – 77E3AFFF KERNEL32.DLL 77E40000 – 77F4DFFF NEXT 16 Trusting the Import Table • Import Table in memory dump can not necessarily be trusted. – Import Table may have been deliberately removed. – Import Table may have been replaced with a fake table. • IDA will show you fake API names. • We should erase the Import Table of memory dump. NEXT 17 Adjusting Image Base • If relocation occurs, some operands of CPU instruction changes on memory. But Image Base in PE header is left unchanged. Both strings of „test1‟ are found at 2000h MOV EAX, offset 102000h from the base address. But the operand … of “MOV EAX” has been changed from 102000: db „test1‟ 102000h to 132000h by the loader. Image Base is 100000h IDA deems the string exists at 32000h Relocated from the base address, because Image MOV EAX, offset 132000h Base value in PE header is still 100000h. … If Image Base value is adjusted to 132000: db „test1‟ 130000h, IDA shows the correct string. Image Base is 130000h NEXT 18 Adjusting Section Tables File Alignment = 200h, Memory Alignment = 1000h Section 0 0 Pointer To Raw Data = 200h 1000h 200 Virtual Address = 1000h Wrong Section 600 Wrong Section Section 1000 Pointer To Raw Data = 600h 2000h Section Virtual Address = 2000h 2000 IDA thinks Section is found at 200h and Section at 600h. Section We should adjust Pointer To Raw Data to be identical to Virtual Address. NEXT 19 Recreating a Missing Header • Once an EXE has started, its PE header is no longer necessary on memory. • There are in fact some programs that erase PE header on memory. – IDA cannot disassemble if PE header is missing. • Some programs destroys Section Tables on memory. – IDA may not show correct disassembly. • For both cases, we should recreate a PE header along with a single flat Section Table. NEXT 20 Searching Hidden Modules • An EXE is loaded first. Some DLLs are loaded next. A DLL may load other DLLs, if not loaded yet. • All the EXE and DLLs (modules) loaded by the OS is managed. They can be enumerated by EnumProcessModules API. The details of each module can be retrieved by GetModuleInformation API. • Traditional packers restore unpacked program in the same module, that can be enumerated. But some packers restore unpacked program in an allocated memory that is not enumerated by EnumProcessModules. We should search all memory blocks for hidden modules. NEXT 21 API Calls Made by Injected Threads • A program (process) cannot access the memory belonging to another process. But some APIs allow a process to read, write or allocate memory in another process. • VirtualAllocEx() is used to allocate memory in a different process. – By copying code to the allocated memory and calling CreateRemoteThread(), a thread can be injected. – If a malicious thread is injected into Internet Explorer, the malicious code can act as a part of Internet Explorer, enabling to bypass firewall. • Unless the injected thread is in a DLL, APIs should be resolved by the thread or its caller. – Some threads receives API list as a parameter (assuming the system DLLs are loaded at the same addresses with those of the caller‟s process), others resolve API and store API addresses in its stack or heap. It is not easy to create a generic tool to resolve APIs used by injected threads, especially for the former case, because it is not easy for a tool to know what is passed as a parameter to the injected thread by investigating memory dumps. NEXT 22 Other Memory Blocks (vacant) Allocated memory block can Stack for main thread contain an injected thread or “stage” program / function Allocated Memory table for API obfuscation. (vacant) EXE We primarily need the memory dumps of EXE or DLL DLL, or both. Allocated Memory Allocated memory block can (vacant) contain a hidden module. Imported DLLs are System DLLs necessary to obtain API function names. NEXT 23 Contents 1 File Image vs. Memory Image 2 API Analysis 3 Generating Memory Dumps 4 Runtime API Address Resolution 5 Basic API Obfuscation 6 Advanced API Obfuscation 24 Classification of API Obfuscation Runtime API Address Resolution API Obfuscation API address is resolved after the main program has started. Initialization-time API Obfuscation API address is resolved before the main program starts. NEXT 25 Decoding API Names by Hashing 0000016F GetAPIaddress proc near 1/2 0000016F arg_0 = dword ptr 14h ; DWORD checksum value 0000016F arg_4 = dword ptr 18h ; virtual address of module (DLL) 0000016F 0000016F push ebx 00000170 push ebp 00000171 push esi 00000172 push edi 00000173 mov ebp, [esp+arg_4] ; module handle (== VA of DLL image base) 00000177 mov eax, [ebp+3Ch] ; position of PE header 0000017A mov edx, [ebp+eax+78h] ; Export Directory Table 0000017E add edx, ebp ; convert RVA to VA 00000180 mov ecx, [edx+18h] ; number of Name Pointers 00000183 mov ebx, [edx+20h] ; Name Pointer RVA 00000186 add ebx, ebp ; convert RVA to VA 00000188 00000188 LOOP_NEXT_API: 00000188 jecxz short NOT_FOUND 0000018A dec ecx 0000018B mov esi, [ebx+ecx*4] ; Export RVA 0000018E add esi, ebp ; convert RVA to VA 00000190 xor edi, edi ; clear the checksum 00000192 cld Trojan.Anicmoo NEXT 26 Decoding API Names by Hashing 00000193 LOOP_NEXT_CHARACTER: 2/2 00000193 00000195 xor eax, eax lodsb ; al <-- [esi] , then esi++ Hashing algorithm can change. 00000196 cmp al, ah ; is it zero (null-terminator)? For example: 00000198 jz short END_OF_API_NAME rol edi, 7 0000019A ror edi, 13 xor edi, eax 0000019D add edi, eax 0000019F jmp short LOOP_NEXT_CHARACTER 000001A1 END_OF_API_NAME: 000001A1 cmp edi, [esp+arg_0] ; compare with the parameter checksum 000001A5 jnz short LOOP_NEXT_API 000001A7 mov ebx, [edx+24h] ; Ordinal Table RVA 000001AA add ebx, ebp ; convert RVA to VA 000001AC mov cx, [ebx+ecx*2] ; get the ordinal number 000001B0 mov ebx, [edx+1Ch] ; Export Address Table RVA 000001B3 add ebx, ebp ; convert RVA to VA 000001B5 mov eax, [ebx+ecx*4] ; get RVA of the API via the ordinal number 000001B8 add eax, ebp ; convert RVA to VA 000001BA jmp RETURN 000001BF NOT_FOUND: 000001BF xor eax, eax 000001C1 RETURN: 000001C1 mov edx, ebp push eax ; HMODULE (== virtual address) of urlmon.dll 000001C3 pop edi 000001C4 pop esi push 702F1A36h ; checksum of URLDownloadToFileA 000001C5 pop ebp call GetAPIaddress 000001C6 pop ebx 000001C7 retn Trojan.Anicmoo NEXT 000001C7 GetAPIaddress endp 27 Decoding API Names by Hashing • Backdoor.Darkmoon uses more complicated hashing algorithm to resolve API addresses. • The resolved API addresses are often stored in local (stack) variables. – It makes it difficult to resolve API names by inspecting memory dumps, because most stack variables are usually lost in a memory dump. • Otherwise they are stored in a heap (allocated) memory. – APIs are referenced by a structure such as [esi+24h]. It is difficult for a generic tool to know where the register esi points to, making it hard to resolve API names by inspecting memory dumps. NEXT 28 The Use of LoadLibrary() and GetProcAddress() • What APIs can be called is easily known by inspecting the Import Table. – If bind, listen, send, recv, RegSetValue, CreateRemoreThread and/or SetWindowsHook is imported, the program will attract virus analysts. • If LoadLibrary() + GetProcAddress() is used instead of static link, Import Table would not yield the suspicious imports. – W32.Spybot.Worm often does it, while the parameter strings to GetProcAddress() such as “bind”, “listen” and so on are visible. • If both most strings and the strings of API names are encrypted, it takes more time to analyze than if not encrypted. – W32.Stration, prevalent in 2006 through 2007, does that. NEXT 29 The Use of LoadLibrary() and GetProcAddress() 00401EE0 sub_401EE0 proc near 00401EE0 var_18 = dword ptr -18h 1/2 00401EE0 var_14 = dword ptr -14h 00401EE0 var_10 = dword ptr -10h 00401EE0 var_C = dword ptr -0Ch 00401EE0 var_8 = dword ptr -8 00401EE0 var_4 = byte ptr -4 00401EE0 arg_0 = dword ptr 4 00401EE0 arg_4 = dword ptr 8 00401EE0 00401EE0 mov eax, dword_404118 ; saved API address 00401EE5 or byte ptr word_40401C, 3Dh 00401EEC sub esp, 18h 00401EEF test eax, eax 00401EF1 jnz short loc_401F48 00401EF3 mov eax, ds:dword_4010C0 ; 637E7640h 00401EF8 mov ecx, ds:dword_4010C4 ; 44657851h 00401EFE mov edx, ds:dword_4010C8 ; 7B70797Eh 00401F04 mov [esp+18h+var_18], eax 00401F07 mov eax, ds:dword_4010CC ; 7D755872h 00401F0C mov [esp+18h+var_14], ecx 00401F10 mov ecx, ds:dword_4010D0 ; 17637472h 00401F16 mov [esp+18h+var_10], edx 00401F1A mov dl, ds:byte_4010D4 ; 0 00401F20 mov [esp+18h+var_C], eax 00401F24 mov [esp+18h+var_8], ecx 00401F28 mov [esp+18h+var_4], dl W32.Stration.CX@mm 00401F2C xor eax, eax NEXT 00401F2E mov edi, edi 30 The Use of LoadLibrary() and GetProcAddress() 00401F30 loc_401F30: 00401F30 xor byte ptr [esp+eax+18h+var_18], 17h ; decrypting 2/2 00401F34 inc eax 00401F35 cmp eax, 14h 00401F38 jl short loc_401F30 00401F3A lea eax, [esp+18h+var_18] 00401F3D push eax ; 'WaitForSingleObject' 00401F3E call sub_401E40 ; get the API address 00401F43 mov dword_404118, eax ; save the API address for the next time 00401F48 00401F48 loc_401F48: 00401F48 mov ecx, [esp+18h+arg_4] 00401F4C mov edx, [esp+18h+arg_0] 00401F50 push ecx 00401F51 push edx 00401F52 call eax ; call the API 00401F54 add esp, 18h 00401F57 retn 8 00401F57 sub_401EE0 endp It is difficult for a tool to resolve API names by memory dumps, because not all the routines have been executed by the time when memory dumps are taken. W32.Stration.CX@mm NEXT 31 Contents 1 File Image vs. Memory Image 2 API Analysis 3 Generating Memory Dumps 4 Runtime API Address Resolution 5 Basic API Obfuscation 6 Advanced API Obfuscation 32 Basic API Obfuscation • Most Windows viruses are written in high-level languages, such as C, C++, Visual Basic and Delphi. • Runtime API Obfuscation requires some code modification. Most virus authors choose to use special libraries or packer software to obfuscate APIs. – If a special library is linked instead of a regular import library, just a call to an API will be redirect to obfuscating code, sometimes with redundant junk code. – If a packer with API obfuscation functionality is used against a compiled EXE/DLL program, a virus author has nothing to think about the API obfuscation, but just uses the packer. – Usually the APIs are resolved before the program proper starts, but after unpacking processes have completed. The obfuscating code, if any, is also generated at the time of API resolution. NEXT 33 Staged API Obfuscation call ds:GetSystemTime or call j_GetSystemTime j_GetSystemTime: jmp ds:GetSystemTime Zero stage (Usual API call) A de-obfuscation tool should generate an IDC script that renames this label to “GetSystemTime.” Existing just to redirect sub_414258: the flow. mov eax, ds:GetSystemTime jmp eax One-stage API obfuscation call sub_414258 There are also multi-stage API obfuscations. NEXT 34 Staged API Obfuscation CreateFileA 370000: mov eax, 77E7B476h 370005: jmp eax Extra-modular one-stage API obfuscation 4010C0: call ds:[402780h] ... 370000h resides 402780: dd 00370000h in a different memory block, which is out of the scope of the analyzing module. 4010C0: call ds:[CreateFileA] ... 402780: CreateFileA dd 00370000h A de-obfuscation tool should rename this label. NEXT 35 Extra-modular Function Table SHFileOperationA 00B5A068 dd 7743DE3Ah 0B5A068h resides in a 00404A13 call dword ptr ds:0B5A068h different memory block, which is out of the scope of the analyzing module. 00404A13 call dword ptr ds:0B5A068h ; SHFileOperationA Because we cannot rename the label at 0B5A068h, which is out of the scope of module, we can only add a comment at 00404A13h. NEXT 36 Immediate Jumps 004023A8 call ds:label_4130B4 ... 004130B4 label_4130B4 dd 972030h A de-obfuscation tool should rename the label “label_4130B4” to “CreateFileA.” Pointing to a different memory block. 00972030 jmp near ptr 77E5B476h; CreateFileA This is not jmp [nnnnnnnn], but jmp nnnnnnnn. It is rare to see a direct (immediate) jump to another module, because the operand of jmp should be relative from the jmp operation itself, requiring an additional step of calculation. NEXT 37 Jump-in • Regardless of API obfuscation, an API call should reach the entry point of the target API. – If the call reaches 77E5B476h and 77E5B476h is the entry point of CreateFileA in kernel32.dll, it is de-obfuscated. call xxxx … 77E5B091 : CreateFileW xxxx: jmp ds:[yyyy] 77E5A47F : CreateFileMappingW 77E5A543 : CreateFileMappingA 77B5B476 : CreateFileA yyyy: dd zzzz 77E546E8 : CreateFiberEx 77E546D0 : CreateFiber 77E54D43 : CreateEventW zzzz: mov edx, 77E5B476h 77E54DE3 : CreateEventA push edx … ret NEXT 38 Jump-in 00401922 call sub_403C08 ... 00403C08 jmp ds:off_404090 ... … 00404090 off_404090 dd offset unk_40C4A1 77E545BE : SwitchToFiber ... 77E74D56 : SuspendThread 0040C4A1 unk_40C54A1: jmp near ptr 0040C4A4h 77E5A325 : SleepEx 0040C4A3 db 0EAh ; dummy byte to distract 77E41BEA : Sleep 0040C4A4 push 0 77E4D2CF : SizeOfResource 0040C4A6 jmp near ptr 77E41BECh … This is the address of Sleep proc near Sleep + 2 push 0 push dword ptr [esp+8] call SleepEx “push 0” takes 2 retn 4 These are dummy (junk) code. bytes. Sleep endp This method can also bypass API hooking and break point, because the NEXT first instruction within the API routine is never executed. 39 Contents 1 File Image vs. Memory Image 2 API Analysis 3 Generating Memory Dumps 4 Runtime API Address Resolution 5 Basic API Obfuscation 6 Advanced API Obfuscation 40 Logic Stage and Skipper Stage 00D50000 sbb ecx,61h ; meaningless instruction 00D50003 jmp short 00D50006h 00D50005 db 0E9h ; placed to obfuscate in disassembler 00D50006 mov ecx, 486366h ; meaningless instruction 00D5000B pop eax ; return address 00D5000C lea eax,[eax+1] ; return address += 1 00D5000F push eax ; return address is now incremented (skipper) 00D50010 push 0D40000h ; the address of the next stage 00D50015 retn ; jump to the next stage Skipper stage performs the increment of return address. Logic stage has some instructions other than calls and jumps. But there are some patterns and emulation is not necessary. NEXT 41 Logic Stage and Skipper Stage The program proper calls the obfuscated API: 00411000 push eax 00411001 call ds:[420008h] ; points to logic and skipper stage 00411007 db 0E9h ; a skipped byte 00411008 or eax, eax ; instruction pointer returns here from the call IDA is confused by 0E9h, which is jmp instruction, and shows “jmp xxxxxxxx,” if a de- obfucation tool does not skip this byte. Rename the label Sample IDC script to deal with this skipper stage MakeName(0x420008,”RegOpenKeyEx”); MakeUnknown(0x411007,1,0); Undefine the skipped byte MakeCode(0x411008); Re-analyze code NEXT Observed in Backdoor.Graybird 42 Copied and Substituted Obfuscation 01230000 mov eax,fs:[18h] Copied API Obfuscation 01230006 mov ecx,[eax+30h] 01230009 mov eax, word ptr [ecx+0B0h] 0123000F movzx edx, word ptr [ecx+0ACh] 01230016 xor eax,0FFFFFFFEh This is an entire copy 01230019 shl eax,0Eh of GetVersion API in 0123001C or eax,edx kernel32.dll. 0123001E shl eax,8 01230021 or eax,[ecx+0A8h] 01230027 shl eax,8 0123002A or eax,[ecx+0A4h] 01230030 ret • Since it does not refer to any OS modules, it is impossible to know what API is called from the address it reached, or indeed whether it is an API call at all. • We have to search major DLLs for the API function that matches the copied API. NEXT 43 Copied and Substituted Obfuscation • A Copied API may change its binary machine code when the relative distance of jump or call is changed. – Short jump (-128 to +127) vs. near jump (-32768 to +32767) – Operand of near call is relative from the caller address 00402000 6A 00 push 0 00402002 FF 74 24 08 push [esp+8] A copy of Sleep API 00402006 E8 1A 83 A5 77 call SleepEx @ 77E5A325 from kernel32.dll. 0040200B C2 04 00 ret 4 77E41BEA 6A 00 push 0 Genuine Sleep API in 77E41BEC FF 74 24 08 push [esp+8] kernel32.dll. 77E41BF0 E8 30 87 01 00 call SleepEx @ 77E5A325 77E41BF5 C2 04 00 ret 4 We have to compare the functions not by machine code but by logical meaning. Because the comparison is time-consuming, we should optimize the performance by ordering the major DLLs. Ex. Kernel32.dll > advapi32.dll > user32.dll > shell32.dll NEXT 44 Copied and Substituted Obfuscation 00F40000 mov eax,fs:[18h] Substituted API Obfuscation 00F40006 mov eax,[eax+34h] 00F40009 ret • This function is found in ntdll.dll as RtlGetLastWin32Error. • Most engineers are more familiar with GetLastError in kernel32.dll than with RtlGetLastWin32Error. – In fact, GetLastError in kernel32.dll is just redirected to RtlGetLastWin32Error in ntdll.dll. • A de-obfuscation tool can have a list of well-known and possibly substituted APIs and present us more commonly used API names. NEXT 45 Push-ret and Push-calc-ret Obfuscation push 71A23ECEh ; bind() Push-ret Obfuscation ret Push-calc-ret Obfuscation 003C80B0 call dword ptr [3E82B8h] ; calls 17B000Dh ... 003E83B8 dd 17B000Dh ; in another memory block ... 017B000D push 3E62B8CDh 017B0012 sub dword ptr [esp], 0CCC079FFh ; = 71A23ECEh (bind()) 017B0019 ret A de-obfuscation tool should detect this pattern and calculate the value in the stack top. NEXT 46 Push-ret and Push-calc-ret Obfuscation Enhanced Push-ret Obfuscation 004014DA mov esi, offset unk_404907 ; stores DWORD-value list 004014DF push dword ptr [esi+30h] ; pushes 8DC82618h 004014E2 push loc_4014ED ; return address 004014E7 push loc_4010A4 ; call destination 004014EC ret ; calls 4010A4h 004014ED <next instruction> ; returns here ... 004010A4 mov edx, [esp+4] ; 8DC82618h <-- came from [esi+30h] 004010A8 mov ecx, [esp+0] ; 004014EDh (return address) 004010AB add esp, 8 004010AE ror edx, 0FAh 004010B1 sub edx, dword_404027 ; == 0FA23D1ADh 004010B7 push ecx ; returning address 004010B8 push edx ; API address of CreateFileA 004010B9 ret ; jumps to CreateFileA There is no call instruction. It does not look like an API call. 47 Push-ret and Push-calc-ret Obfuscation mov esi, xxxx 1. Detect this pattern. push dword ptr [esi+xx] push xxxx push xxxx ret 2. Read the value at [esi+xx] and get the API name. 004014DA mov esi, offset unk_404907 004014DF push dword ptr [esi+30h] ; calls CreateFileA 004014E2 push loc_4014ED ; returns to 4014EDh 004014E7 push loc_4010A4 004014EC ret 004014ED <next instruction> 3. Add the comments on IDA. NEXT 48 Padded and Copied API Obfuscation (Themida) 00401B77 call 2930000h ... 02930000 push edx ; making room for EBP 02930001 push eax ; save EAX 02930002 push edx ; save EDX 02930003 jmp 293000Eh ... 0293000E rdtsc ; destroys EDX:EAX 02930010 jmp 2930029h ... 02930029 pop edx ; restore EDX 0293002A pop eax ; restore EAX 0293002B mov [esp],ebp Initial code is copied from the API function, but replaced with longer instructions and interleaved with redundant instructions. But it has some patterns. 49 Padded and Copied API Obfuscation (Themida) 00401B77 call 2930000h ... 02930000 push edx ; making room for EBP 02930001 push eax ; save EAX 02930002 push edx ; save EDX 02930003 jmp 293000Eh ... 0293000E rdtsc ; destroys EDX:EAX 02930010 jmp 2930029h ... 02930029 pop edx ; restore EDX 0293002A pop eax ; restore EAX 0293002B mov [esp],ebp 1. Remove the redundant instructions, according to an internal redundant code block list. 50 Padded and Copied API Obfuscation (Themida) 00401B77 call 2930000h ... 02930000 push edx ; making room for EBP 0293002B mov [esp],ebp 2. Replace the instructions according to push reg32(1) an internal conversion table. mov [esp], reg32(2) push reg32(2) 00401B77 call 2930000h ... 02930000 push ebp 3. Follow the steps necessary for Jump-in obfuscation. NEXT 51 Padded and Copied API Obfuscation (Enigma) 00401753 call dword ptr ds:973245h ; it points to 974819h ... 00974819 call 97481Fh ; 0097481E push esi ; dummy instruction 0097481F call 974827h ; 00974824 jmp 97482Ah 00974826 db 15h ; dummy code 00974827 ret 4 ; 0097482A add esp, 5C9099Bh 00974830 mov [esp-5C909Fh],esi ; mov [esp-4], esi 00974837 add esp, 0FA36F661h ; sub esp, 4 (5C9099Bh + 0FA36F661h == -4) 0097483D call 974846h 00974842 db 80h, 0DEh, 9Dh, 70h ; dummy code 00974846 add esp,4 It is similar to Themida, but the pattern is different. 52 Padded and Copied API Obfuscation (Enigma) 00401753 call dword ptr ds:973245h ; it points to 974819h ... 00974819 call 97481Fh ; 0097481E push esi ; dummy instruction 0097481F call 974827h ; 00974824 jmp 97482Ah 00974826 db 15h ; dummy code 00974827 ret 4 ; 0097482A add esp, 5C9099Bh 00974830 mov [esp-5C909Fh],esi ; mov [esp-4], esi 00974837 add esp, 0FA36F661h ; sub esp, 4 (5C9099Bh + 0FA36F661h == -4) 0097483D call 974846h 00974842 db 80h, 0DEh, 9Dh, 70h ; dummy code 00974846 add esp,4 1. Remove the redundant instructions, according to an internal redundant code block list. 53 Padded and Copied API Obfuscation (Enigma) 00401753 call dword ptr ds:973245h ; it points to 974819h ... 00974819 …. 0097482A add esp, 5C9099Bh 00974830 mov [esp-5C909Fh],esi ; mov [esp-4], esi 00974837 add esp, 0FA36F661h ; sub esp, 4 (5C9099Bh + 0FA36F661h == -4) 2. Replace the instructions according to an internal conversion table. 00401753 call dword ptr ds:973245h add esp, n ... mov [esp-(n+4), reg32(1) 00974819 push esi add esp, (100000000h – (n+4)) push reg32(1) 3. Follow the steps necessary for Jump-in obfuscation. NEXT 54 Splicing Intensive Instructions to Provide Obfuscation (Obsidium) • Sometimes emulation is required to obtain the target API address. – Example 1 : All API calls reach a common dispatcher routine, which receives EDX register as a parameter. EDX is calculated to an index which leads to a table of API addresses. – Example 2: All API calls reach a common dispatcher routine, which receives no parameter. Instead, it checks as to where it is called from by reading the stack. Then it refers to a table to get the target API address from the return address in the stack. Return address in stack EDX Dispatcher API Dispatcher API Table Example 1 Table Example 2 Emulation is effective in dealing with these kinds of API obfuscation. But it is time- consuming and sometimes yields wrong answers. Emulation should be a last resort, employed only when the use of conventional analytical techniques proves impossible. NEXT 55 Splicing Intensive Instructions to Provide Obfuscation (Obsidium) 008B6037 55 push ebp 008B6038 8B EC mov ebp, esp 008B603A 81 EC 30 01 00 00 sub esp, 130h Exception handler 008B6040 EB 04 jmp short 008B6046 008B6046 60 pusha Obsidium sets up a Structured 008B6047 EB 04 jmp short 008B604D 008B604D 9C pushf Exception Handler (SEH) to continue, 008B604E EB 03 jmp short 008B6053 while impeding a debugger. 008B6053 EB 04 jmp short 008B6059 008B6059 E8 00 00 00 00 call $+5 (008B605E) 008B605E EB 01 jmp short 008B6061 008B6087 64 89 20 mov fs:[eax], esp 008B6061 5E pop esi 008B608A EB 01 jmp short 008B608D 008B6062 EB 03 jmp short 008B6067 008B608D EB 03 jmp short 008B6092 008B6067 EB 01 jmp short 008B606A 008B6092 EB 02 jmp short 008B6096 008B606A 8B 96 64 03 00 00 lea edx, [esi+364h] 008B6096 EB 36 jmp short 008B60CE 008B6070 EB 04 jmp short 008B6076 008B60CE EB 01 jmp short 008B60D1 008B6076 33 C0 xor eax, eax 008B60D1 8B 54 24 30 mov edx, [esp+30h] 008B6078 EB 03 jmp short 008B607D 008B60D5 EB 01 jmp short 008B60D8 008B607D 52 push edx 008B60D8 EB C1 jmp short 008B609B 008B607E EB 01 jmp short 008B6081 008B609B EB 02 jmp short 008B609F 008B6081 64 FF 30 push dword ptr fs:[eax] 008B609F F7 C2 01 00 00 00 test edx, 1 008B6084 EB 01 jmp short 008B6087 008B60A5 EB 04 jmp short 008B60AB 008B60AB 74 0C jz 008B60B9 008B60AD EB 04 jmp short 008B60B3 008B60B3 0F 0B ud2 ; undefined opcode Many meaningless short jumps. 008B60B5 EB 02 jmp short 008B60B9 Let‟s remove them all. 008B60B9 EB 03 jmp short 008B60BE 008B60BE F7 F0 div eax ; division by zero NEXT 56 Splicing Intensive Instructions to Provide Obfuscation (Obsidium) 008B6037 55 push ebp 008B6038 8B EC mov ebp, esp 008B603A 81 EC 30 01 00 00 sub esp, 130h 008B6046 60 pusha ; push EAX,ECX,EDX,EBX,ESP,BP,ESI,EDI 008B604D 9C pushf ; push EFLAGS 008B6059 E8 00 00 00 00 call $+5 (008B605E) 008B6061 5E pop esi ; esi = 008B6061h 008B606A 8B 96 64 03 00 00 lea edx, [esi+364h] ; edx = 008B63C5h 008B6076 33 C0 xor eax, eax 008B607D 52 push edx ; exception handler address (008B63C5h) 008B6081 64 FF 30 push dword ptr fs:[eax] 008B6087 64 89 20 mov fs:[eax], esp 008B60D1 8B 54 24 30 mov edx, [esp+30h] ; value from uninitialized stack variable 008B609F F7 C2 01 00 00 00 test edx, 1 008B60AB 74 0C jz 008B60BE 008B60B3 0F 0B ud2 ; undefined opcode exception 008B60BE F7 F0 div eax ; division by zero exception Exception occurs. The next instruction resumes from 008B63C5h. NEXT 57 Splicing Intensive Instructions to Provide Obfuscation (Obsidium) Exception handler 008B63C5 EB 03 jmp short 008B63CA 008B63CA E8 00 00 00 00 call $+5 (008B63CF) 008B63CF EB 02 jmp short 008B63D3 008B63D3 5A pop edx 008B63D4 EB 01 jmp short 008B63D7 008B63D7 8B 8A 95 FB FF FF mov ecx, [edx-46Bh] 008B63DD EB 04 jmp short 008B63E3 Exception handler is just a continued code of API resolution, not an error handler. NEXT 58 Splicing Intensive Instructions to Provide Obfuscation (Obsidium) Dummy call-ret … call 77E7xxxxh Kernel32.dll … 1. The obfuscator searches a system C3 ret DLL for a byte of 0C3h in advance. 2. During an API call, it calls the address where 0C3h was found. 3. But 0C3h is just a “ret” instruction. A de-obfuscation tool should ignore this dummy call to a system DLL. Otherwise, the tool would think it has reached an API in the DLL. NEXT 59 Splicing Intensive Instructions to Provide Obfuscation (Obsidium) 16-bit addressing mode 64 67 FF 36 00 00 push dword ptr fs: ; 67 changes from 32 bit to 16 bit mode 64 67 89 26 00 00 mov fs:, esp It is unusual to see 16-bit addressing mode instructions in Win32 user-mode application. But the emulator should support 16-bit addressing mode. NEXT 60 Conclusion Uncovering obfuscated API calls is a difficult task given the wide range of obfuscation techniques that can be used and combined to hide a program‟s functionality. Emulation centered on the call instruction may initially seem to be an effective method of de-obfuscation but suffers from the disadvantage of being defeated by the copying of API code and may yield false positives when the emulated instruction pointer reaches an OS module. Emulation can also be time-consuming and as such may not be the best choice in situations in which results are required in a timely manner. It is therefore necessary to design a modular de-obfuscation tool able to deal with the myriad of techniques described in this presentation. 61 Thank You! Question? (in easy English) 質問? (日本語でどうぞ) Вопрос? (не по-русски) Masaki Suenaga firstname.lastname@example.org Copyright © 2009 Symantec Corporation. All rights reserved. Symantec and the Symantec Logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. This document is provided for informational purposes only and is not intended as advertising. All warranties relating to the information in this document, either express or implied, are disclaimed to the maximum extent allowed by law. The information in this document is subject to change without notice.
Pages to are hidden for
"A Museum of API Obfuscation on Win32 A Museum of API Obfuscation on - PowerPoint"Please download to view full document