Windows x64 Shellcode

Contents

Introduction

Shellcode refers to a chunk of executable machine code (along with any associated data) which is executed after being injected into the memory of a process usually by means of a buffer-overflow type of security vulnerability.  The term comes from the fact that in early exploits against Unix platforms, an attacker would typically execute code that would start a command shell listening on a TCP/IP port, to which the attacker could then connect and have full access to the system.  For the common web-browser and application exploits on Windows today, the “shellcode” is more likely to download and execute another program than spawn a command shell, but the term remains.

In general, shellcode can be thought of as any code that is capable of being executed from an arbitrary location in memory and without relying on services provided by the operating system loader as with traditional executables. Depending on the exploit, additional requirements for shellcode may include small size and avoiding certain byte patterns in the code. In any case, there are two tasks performed by the loader which shellcode must take care of itself:

  1. Getting the addresses of data elements (such as strings referenced by the code)
  2. Getting the addresses of system API functions used

This article describes a shellcode implementation of the x64 assembly program from my Windows Assembly Languages article (refer to that article for general x64 assembly programming issues such as calling conventions and stack usage).  As you’ll see, the main program code doesn’t look much different.  Task #1 above actually turns out to be a non-issue on x64 platforms due to a new feature called RIP-relative addressing. Task #2 is what comprises the bulk of the effort.  In fact, the code for looking up API functions is significantly larger and more complex than the main program itself.  The only other difference between the vanilla and shellcode versions of x64 hello world is that the shellcode does not use a .data section, instead placing the strings in the .code section after main.  This is because “sections” are a feature of the executable file format, whereas shellcode needs to be just a single block of code and data.

RIP-Relative Addressing

RIP refers to the instruction pointer register on x64, and RIP-relative addressing means that references to memory addresses being read or written can be encoded as offsets from the currently-executing instruction.  This is not a completely new concept, as jmp and call instructions have always supported relative targets on x86, but the ability to read and write memory using relative addressing is new with x64.

On x86, the labels referring to data variables would be replaced with actual hard-coded memory addresses when the program was assembled and linked, under the assumption that the program would be loaded at a specific base address.  If at runtime the program needed to load at a different base address, the loader would perform relocation by updating all of those hard-coded addresses.  Because shellcode needed to run from anywhere in memory, it needed to determine these addresses dynamically and typically used a trick where the call instruction would push the address just past itself onto the stack as the return address.  This “return address” could then be popped off the stack to get a pointer to the string at runtime:

    call skip
    db ‘Hello world’, 0
skip:
    pop esi      ;esi now points to ‘Hello world’ string

On x64 we do not need this trick.  RIP-relative addressing is not only supported but is in fact the default, so we can simply refer to strings using labels as with ordinary code and it Just Works.

API Lookup Overview

Even the most trivial programs generally need to call various operating system API functions to perform some of type of input/output (I/O) – displaying things to the user, accessing files, making network connections, etc.  On Windows these API functions are implemented in various system DLLs, and in standard application development these API functions can simply be referred to by name.  When the program is compiled and linked, the linker puts information in the resulting executable indicating which functions from which DLLs are required.  When the program is run, the loader ensures that the necessary DLLs are loaded and that the addresses of the called functions are resolved.

Windows also provides another facility that can be used by applications to load additional DLLs and look up functions on demand:  the LoadLibrary() and GetProcAddress() APIs in kernel32.dll.  Not having the benefit of the loader, shellcode needs to use LoadLibrary() and GetProcAddress() for all API functions it uses.  This unfortunately presents a Catch-22:  How does the shellcode get the addresses of LoadLibrary() and GetProcAddress()?

It turns out that an equivalent to GetProcAddress() can be implemented by traversing the data structures of a loaded DLL in memory.  Also, kernel32.dll is always loaded in the address space of every process on Windows,  so LoadLibrary() can be found there and used to load other DLLs.

Developing shellcode using this technique requires a solid understanding of the Portable Executable (PE) file format used on Windows for EXE and DLL files, and the next section of this article assumes some familiarity. The following references and tools may be helpful:

  • Matt Pietrek’s An In-Depth Look into the Win32 Portable Executable File Format: part1 and part2.  Note that this only covers 32-bit and not 64-bit PE files, but the differences are very minor – mostly just widening some memory address fields to 64 bits
  • The offical Microsoft Portable Executable and Common Object File Format Specification
  • Daniel Pistelli’s CFF Explorer is a nice GUI tool for viewing and editing PE files, with 64-bit support
  • The dumpbin utility included with Visual C++ (including Express Edition) – the most useful switches for our purposes are /headers and /exports
  • Many of the PE data structures are documented in MSDN under ImageHlp Structures
  • Definitions of the data structures can be found in winnt.h in the Include directory of the Windows SDK
  • The dt command in WinDbg is able to display many of these structures

API Lookup Demo

This demonstration of how to find the address of a function in a loaded DLL can be followed by attaching WinDbg to any 64-bit process (I’m using notepad.exe).  Note that the particular values seen here may be different on your system.

First we’ll get the address of the Thread Environment Block (TEB), sometimes also referred to as the Thread Information Block (TIB). The TEB contains a large number of fields pertaining to the current thread, and on x64 the fields can be accessed as offsets from the GS segment register during program execution (the FS register was used on x86). In WinDbg, the pseudo register $teb contains the address of the TEB.

0:001> r $teb
$teb=000007fffffdb000
0:001> dt _TEB @$teb
ntdll!_TEB
   +0x000 NtTib            : _NT_TIB
   +0x038 EnvironmentPointer : (null)
   +0x040 ClientId         : _CLIENT_ID
   +0x050 ActiveRpcHandle  : (null)
   +0x058 ThreadLocalStoragePointer : (null)
   +0x060 ProcessEnvironmentBlock : 0x000007ff`fffdd000 _PEB
   +0x068 LastErrorValue   : 0
   [...]

The only field from the TEB we are interested in is the pointer to the Process Environment Block (PEB). Note that WinDbg also has a $peb pseudo-register, but in the shellcode implementation we will have to use the GS register to go through the TEB first.

0:001> dt _PEB 7ff`fffdd000
ntdll!_PEB
   +0×000 InheritedAddressSpace : 0 ''
   +0×001 ReadImageFileExecOptions : 0 ''
   +0×002 BeingDebugged    : 0×1 ''
   +0×003 BitField         : 0×8 ''
   +0×003 ImageUsesLargePages : 0y0
   +0×003 IsProtectedProcess : 0y0
   +0×003 IsLegacyProcess  : 0y0
   +0×003 IsImageDynamicallyRelocated : 0y1
   +0×003 SkipPatchingUser32Forwarders : 0y0
   +0×003 SpareBits        : 0y000
   +0×008 Mutant           : 0xffffffff`ffffffff Void
   +0×010 ImageBaseAddress : 0×00000000`ff8b0000 Void
   +0×018 Ldr              : 0×00000000`779a3640 _PEB_LDR_DATA
   [...]

The PEB contains numerous fields with process-specific data and we are interested in the Ldr field at offset 0x18 which points to a structure of type PEB_LDR_DATA.

0:001> dt _PEB_LDR_DATA 779a3640
ntdll!_PEB_LDR_DATA
   +0×000 Length           : 0×58
   +0×004 Initialized      : 0×1 ''
   +0×008 SsHandle         : (null)
   +0×010 InLoadOrderModuleList: _LIST_ENTRY [ 0x00000000`00373040 - 0x39a3b0 ]
   +0×020 InMemoryOrderModuleList: _LIST_ENTRY [ 0x00000000`00373050 - 0x39a3c0 ]
   +0×030 InInitializationOrderModuleList: _LIST_ENTRY [ 0x00000000`00373150 - 0x39a3d0 ]
   +0×040 EntryInProgress  : (null)
   +0×048 ShutdownInProgress : 0 ''
   +0×050 ShutdownThreadId : (null)

The PEB_LDR_DATA structure contains three linked lists of loaded modules – InLoadOrderModuleList, InMemoryOrderModuleList, and InInitializationOrderModuleList. A module or image refers to any PE file in memory – the main program executable as well as any currently-loaded DLLs. All three lists contain the same elements just in a different order, with the one exception that InInitializationOrderModuleList only contains DLLs and excludes the main executable.

The elements of these lists are of type LDR_DATA_TABLE_ENTRY, though you can’t tell from the previous output because they are only shown as LIST_ENTRY which is the generic linked list header datatype used throughout Windows.  A LIST_ENTRY simply consists of a forward and back pointer for creating circular, doubly-linked lists.  The address of the _LIST_ENTRY within the _PEB_LDR_DATA structure represents the list head. When traversing the circular list, arriving back at the list head is the way to know when complete.

0:001> dt _LIST_ENTRY
ntdll!_LIST_ENTRY
   +0×000 Flink            : Ptr64 _LIST_ENTRY
   +0×008 Blink            : Ptr64 _LIST_ENTRY

The !list command provides the ability to traverse these types of lists and execute a specific command for each element in the list (in this case displaying the element as an LDR_DATA_TABLE_ENTRY data structure). WinDbg commands can get nasty-looking sometimes but are quite powerful. Here we display the InLoadOrderModuleList with list head at offset 0x10 from the beginning of the PEB_LDR_DATA structure (very long output truncated to show just part of one element):

0:001> !list -t ntdll!_LIST_ENTRY.Flink -x "dt _LDR_DATA_TABLE_ENTRY @$extret" 779a3640+10
   [...]
ntdll!_LDR_DATA_TABLE_ENTRY
   +0x000 InLoadOrderLinks : _LIST_ENTRY [ 0x00000000`00333620 - 0x333130 ]
   +0x010 InMemoryOrderLinks : _LIST_ENTRY [ 0x00000000`00333630 - 0x333140 ]
   +0x020 InInitializationOrderLinks : _LIST_ENTRY [ 0x00000000`003344e0 - 0x333640 ]
   +0x030 DllBase          : 0x00000000`77650000 Void
   +0x038 EntryPoint       : 0x00000000`7766eff0 Void
   +0x040 SizeOfImage      : 0x11f000
   +0x048 FullDllName      : _UNICODE_STRING "C:\Windows\system32\kernel32.dll"
   +0x058 BaseDllName      : _UNICODE_STRING "kernel32.dll"
   +0x068 Flags            : 0x84004
   [...]

Interesting fields for us within an LDR_DATA_TABLE_ENTRY structure are DllBase at 0x30 and BaseDllName at 0x58. Note that BaseDllName is a UNICODE_STRING, which is an actual data structure and not simply a null-terminated Unicode string. The actual string data can be found at offset 0x8 in the structure, for a total of 0x60 from BaseDllName.

0:001> dt _UNICODE_STRING
ntdll!_UNICODE_STRING
   +0×000 Length           : Uint2B
   +0×002 MaximumLength    : Uint2B
   +0×008 Buffer           : Ptr64 Uint2B

Armed with this knowledge, we now have the ability to obtain the base address of any DLL given it’s name. Once we have the base address we can traverse the DLL in memory to locate any function exported by the DLL. Also note that the return value of LoadLibrary() is in fact a DLL base address. The base address of a loaded DLL can also be obtained in WinDbg with the lm command. Let’s take a look at kernel32.dll:

0:001> lm m kernel32
start             end                 module name
00000000`77650000 00000000`7776f000   kernel32   (deferred)

An interesting feature of the PE file and loader is that the PE file format in memory is exactly the same as it is on disk, at least as far as the headers. It’s not exactly true that the entire file is read verbatim into memory, because each section is loaded at a certain byte alignment in memory (typically a multiple of 4096, the virtual memory page size) that may be different from where it falls in the file. Also, some sections (like a debug data section) may not be read into memory at all. However, when we look at the DLL base address in memory, we can expect to find what we see at the beginning of any PE file: a DOS “MZ” header. That’s an IMAGE_DOS_HEADER structure to be exact:

0:001> dt _IMAGE_DOS_HEADER 77650000
ntdll!_IMAGE_DOS_HEADER
   +0×000 e_magic          : 0x5a4d
   +0×002 e_cblp           : 0×90
   +0×004 e_cp             : 3
   +0×006 e_crlc           : 0
   +0×008 e_cparhdr        : 4
   +0x00a e_minalloc       : 0
   +0x00c e_maxalloc       : 0xffff
   +0x00e e_ss             : 0
   +0×010 e_sp             : 0xb8
   +0×012 e_csum           : 0
   +0×014 e_ip             : 0
   +0×016 e_cs             : 0
   +0×018 e_lfarlc         : 0×40
   +0x01a e_ovno           : 0
   +0x01c e_res            : [4] 0
   +0×024 e_oemid          : 0
   +0×026 e_oeminfo        : 0
   +0×028 e_res2           : [10] 0
   +0x03c e_lfanew         : 0n224

The e_lfanew field at 0x3c (which for some reason is displayed as a decimal number even though everything else is hex) contains the byte offset to the NT header (IMAGE_NT_HEADERS64). Converting 224 to hex 0xe0 and adding to the image base will point to the NT header at 0x776500e0. We can use the –r option (recursive) to expand the embedded OptionalHeader field (which is a misnomer as it is required and always present):

0:001> dt -r _IMAGE_NT_HEADERS64 776500e0
ntdll!_IMAGE_NT_HEADERS64
   +0×000 Signature        : 0×4550
   +0×004 FileHeader       : _IMAGE_FILE_HEADER
      +0×000 Machine          : 0×8664
      +0×002 NumberOfSections : 6
      +0×004 TimeDateStamp    : 0x4a5bdfdf
      +0×008 PointerToSymbolTable : 0
      +0x00c NumberOfSymbols  : 0
      +0×010 SizeOfOptionalHeader : 0xf0
      +0×012 Characteristics  : 0×2022
   +0×018 OptionalHeader   : _IMAGE_OPTIONAL_HEADER64
      +0×000 Magic            : 0x20b
      +0×002 MajorLinkerVersion : 0×9 ''
      +0×003 MinorLinkerVersion : 0 ''
      [...]
      +0×068 LoaderFlags      : 0
      +0x06c NumberOfRvaAndSizes : 0×10
      +0×070 DataDirectory    : [16] _IMAGE_DATA_DIRECTORY
      [...]

The DataDirectory field is located a total of 0x88 bytes from the NT headers (offset 0x70 from OptionalHeader which is 0x18 from the NT headers). This is an array of 16 elements corresponding to the various types of data in a PE file.

0:001> dt -a16c _IMAGE_DATA_DIRECTORY 776500e0+88
ntdll!_IMAGE_DATA_DIRECTORY
[0] @ 0000000077650168 +0×000 VirtualAddress 0xa0020  +0×004 Size 0xac33
[1] @ 0000000077650170 +0×000 VirtualAddress 0xf848c  +0×004 Size 0x1f4
[2] @ 0000000077650178 +0×000 VirtualAddress 0×116000  +0×004 Size 0×520
[3] @ 0000000077650180 +0×000 VirtualAddress 0x10c000  +0×004 Size 0×9810
[4] @ 0000000077650188 +0×000 VirtualAddress 0  +0×004 Size 0
[5] @ 0000000077650190 +0×000 VirtualAddress 0×117000  +0×004 Size 0x7a9c
[6] @ 0000000077650198 +0×000 VirtualAddress 0x9b7dc  +0×004 Size 0×38
[7] @ 00000000776501a0 +0×000 VirtualAddress 0  +0×004 Size 0
[8] @ 00000000776501a8 +0×000 VirtualAddress 0  +0×004 Size 0
[9] @ 00000000776501b0 +0×000 VirtualAddress 0  +0×004 Size 0
[10] @ 00000000776501b8 +0×000 VirtualAddress 0  +0×004 Size 0
[11] @ 00000000776501c0 +0×000 VirtualAddress 0x2d8  +0×004 Size 0×408
[12] @ 00000000776501c8 +0×000 VirtualAddress 0x9c000  +0×004 Size 0x1c70
[13] @ 00000000776501d0 +0×000 VirtualAddress 0  +0×004 Size 0
[14] @ 00000000776501d8 +0×000 VirtualAddress 0  +0×004 Size 0
[15] @ 00000000776501e0 +0×000 VirtualAddress 0  +0×004 Size 0

We are interested in the Export Directory which is the first one in the list having VirtualAddress 0xa0020 and Size 0xac33. See the MSDN documentation of the IMAGE_DATA_DIRECTORY structure for a reference on which type of data goes with each array element.

A virtual address, also called a Relative Virtual Address (RVA) is an offset from the base load address of the module. RVAs are used extensively in PE files, including for the pointers to the function names and function addresses in the export table. To get the actual memory address pointed to by an RVA, simply add the base address of the module.

(For convenience, note that the !dh command can be used to automatically display much of the PE header information we’ve extracted manually so far.)

Given that the Export Directory begins at RVA 0xa0020, we add the base address 0x77650000 and should therefore expect to find an IMAGE_EXPORT_DIRECTORY structure at 0x776f0020. Unfortunately IMAGE_EXPORT_DIRECTORY is not understood by the dt command or documented in MSDN, so we will have to refer to the structure definition in winnt.h:

typedef struct _IMAGE_EXPORT_DIRECTORY {
    DWORD   Characteristics;
    DWORD   TimeDateStamp;
    WORD    MajorVersion;
    WORD    MinorVersion;
    DWORD   Name;
    DWORD   Base;
    DWORD   NumberOfFunctions;
    DWORD   NumberOfNames;
    DWORD   AddressOfFunctions;     // RVA from base of image
    DWORD   AddressOfNames;         // RVA from base of image
    DWORD   AddressOfNameOrdinals;  // RVA from base of image
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

The best we can do in WinDbg is display the structure as an array of DWORDs and count where things fall using the above structure as a reference.

0:001> dd 776f0020
00000000`776f0020  00000000 4a5bc32c 00000000 000a366c
00000000`776f0030  00000001 0000056a 0000056a 000a0048
00000000`776f0040  000a15f0 000a2b98 000aa10b 000aa12c
[...]

Beginning with the 8th DWORD within the structure we will find AddressOfFunctions (0xa0048), followed by AddressOfNames (0xa15f0) and AddressOfNameOrdinals (0xa2b98). These values are RVAs – when we add the DLL base address we will get the memory address of the array. When working with RVAs a lot it can be handy to stash the DLL base address in a pseudo-register because it will be used so frequently. Here is AddressOfNames:

0:001> r $t0=77650000
0:001> dd @$t0+a15f0
00000000`776f15f0  000a3679 000a3691 000a36a6 000a36b5
00000000`776f1600  000a36be 000a36c7 000a36d8 000a36e9
00000000`776f1610  000a370f 000a372e 000a374d 000a375a
[...]

This is an array of RVAs pointing to the function name strings (the size of the array is given by the NumberOfNames field in IMAGE_EXPORT_DIRECTORY). Take a look at the first one (adding DLL base address of course) and we see the name of a function exported from kernel32.dll.

0:001> da @$t0+a3679
00000000`776f3679  "AcquireSRWLockExclusive"

We can ultimately find the address of a function based on the array index of where the name is found in this array. The AddressOfNameOrdinals array is a parallel array to AddressOfNames, which contains the ordinal values associated with each name. An ordinal value is the index which is finally used to look up the function address in the AddressOfFunctions array. (DLLs have the option of exporting functions by ordinal only without even having a function name, and in fact the GetProcAddress() API can be called with a numeric ordinal instead of a string name).

More often than not, the value in each slot of the AddressOfNameOrdinals array has the same value as its array index but this is not guaranteed. Note that AddressOfNameOrdinals is an array of WORDs, not DWORDs. In this case it appears to follow the pattern of each element having the same value as its index.

0:001> dw @$t0+a2b98
00000000`776f2b98  0000 0001 0002 0003 0004 0005 0006 0007
00000000`776f2ba8  0008 0009 000a 000b 000c 000d 000e 000f
00000000`776f2bb8  0010 0011 0012 0013 0014 0015 0016 0017
[...]

Once we have the ordinal number of a function, the ordinal is used as an index into the AddressOfFunctions array:

0:001> dd @$t0+a0048
00000000`776f0048  000aa10b 000aa12c 000044b0 00066b20
00000000`776f0058  00066ac0 0006ad90 0006ae00 0004b7d0
00000000`776f0068  000956e0 0008fbb0 00048cc0 0004b800
[...]

The interpretation of the values in this array depends on whether the function is forwarded.  Export Forwarding is a mechanism by which a DLL can declare that an exported function is actually implemented in a different DLL.  If the function is not forwarded, the value is an RVA pointing to the actual function code. If the function is forwarded, the RVA points to an ASCII string giving the target DLL and function name. You can tell in advance if a function is forwarded based on the range of the RVA – the function is forwarded if the RVA falls within the export directory (as given by the VirtualAdress and Size in the IMAGE_DATA_DIRECTORY entry).

You can practically see at a glance which RVAs above are in the vicinity of the export directory addresses we’ve been working with. The first element in the array corresponds to our old friend AcquireSRWLockExclusive which we can see is forwarded to another function in NTDLL:

0:001> da @$t0+aa10b
00000000`776fa10b  "NTDLL.RtlAcquireSRWLockExclusive"
00000000`776fa12b  ""

The third array element, on the other hand, is not forwarded and points directly to the executable code of ActivateActCtx:

0:001> u @$t0+44b0
kernel32!ActivateActCtx:
00000000`776544b0 4883ec28        sub     rsp,28h
00000000`776544b4 4883f9ff        cmp     rcx,0FFFFFFFFFFFFFFFFh
[...]

We now have all of the understanding we need to get the address of a function and it’s just a matter of implementing the above steps in code.

The Code

Updated 11/10/2011 – thanks to Didier Stevens for pointing out a bug in the error handling.

;shell64.asm
;License: MIT (http://www.opensource.org/licenses/mit-license.php)

 .code

;note: ExitProcess is forwarded
main proc
    sub rsp, 28h            ;reserve stack space for called functions
    and rsp, 0fffffffffffffff0h     ;make sure stack 16-byte aligned   

    lea rdx, loadlib_func
    lea rcx, kernel32_dll
    call lookup_api         ;get address of LoadLibraryA
    mov r15, rax            ;save for later use with forwarded exports

    lea rcx, user32_dll
    call rax                ;load user32.dll

    lea rdx, msgbox_func
    lea rcx, user32_dll
    call lookup_api         ;get address of MessageBoxA

    xor r9, r9              ;MB_OK
    lea r8, title_str       ;caption
    lea rdx, hello_str      ;Hello world
    xor rcx, rcx            ;hWnd (NULL)
    call rax                ;display message box

    lea rdx, exitproc_func
    lea rcx, kernel32_dll
    call lookup_api         ;get address of ExitProcess

    xor rcx, rcx            ;exit code zero
    call rax                ;exit

main endp

kernel32_dll    db  'KERNEL32.DLL', 0
loadlib_func    db  'LoadLibraryA', 0
user32_dll      db  'USER32.DLL', 0
msgbox_func     db  'MessageBoxA', 0
hello_str       db  'Hello world', 0
title_str       db  'Message', 0
exitproc_func   db  'ExitProcess', 0

;look up address of function from DLL export table
;rcx=DLL name string, rdx=function name string
;DLL name must be in uppercase
;r15=address of LoadLibraryA (optional, needed if export is forwarded)
;returns address in rax
;returns 0 if DLL not loaded or exported function not found in DLL
lookup_api  proc
    sub rsp, 28h            ;set up stack frame in case we call loadlibrary

start:
    mov r8, gs:[60h]        ;peb
    mov r8, [r8+18h]        ;peb loader data
    lea r12, [r8+10h]       ;InLoadOrderModuleList (list head) - save for later
    mov r8, [r12]           ;follow _LIST_ENTRY->Flink to first item in list
    cld

for_each_dll:               ;r8 points to current _ldr_data_table_entry

    mov rdi, [r8+60h]       ;UNICODE_STRING at 58h, actual string buffer at 60h
    mov rsi, rcx            ;pointer to dll we're looking for

compare_dll:
    lodsb                   ;load character of our dll name string
    test al, al             ;check for null terminator
    jz found_dll            ;if at the end of our string and all matched so far, found it

    mov ah, [rdi]           ;get character of current dll
    cmp ah, 61h             ;lowercase 'a'
    jl uppercase
    sub ah, 20h             ;convert to uppercase

uppercase:
    cmp ah, al
    jne wrong_dll           ;found a character mismatch - try next dll

    inc rdi                 ;skip to next unicode character
    inc rdi
    jmp compare_dll         ;continue string comparison

wrong_dll:
    mov r8, [r8]            ;move to next _list_entry (following Flink pointer)
    cmp r8, r12             ;see if we're back at the list head (circular list)
    jne for_each_dll

    xor rax, rax            ;DLL not found
    jmp done

found_dll:
    mov rbx, [r8+30h]       ;get dll base addr - points to DOS "MZ" header

    mov r9d, [rbx+3ch]      ;get DOS header e_lfanew field for offset to "PE" header
    add r9, rbx             ;add to base - now r9 points to _image_nt_headers64
    add r9, 88h             ;18h to optional header + 70h to data directories
                            ;r9 now points to _image_data_directory[0] array entry
                            ;which is the export directory

    mov r13d, [r9]          ;get virtual address of export directory
    test r13, r13           ;if zero, module does not have export table
    jnz has_exports

    xor rax, rax            ;no exports - function will not be found in dll
    jmp done

has_exports:
    lea r8, [rbx+r13]       ;add dll base to get actual memory address
                            ;r8 points to _image_export_directory structure (see winnt.h)

    mov r14d, [r9+4]        ;get size of export directory
    add r14, r13            ;add base rva of export directory
                            ;r13 and r14 now contain range of export directory
                            ;will be used later to check if export is forwarded

    mov ecx, [r8+18h]       ;NumberOfNames
    mov r10d, [r8+20h]      ;AddressOfNames (array of RVAs)
    add r10, rbx            ;add dll base

    dec ecx                 ;point to last element in array (searching backwards)
for_each_func:
    lea r9, [r10 + 4*rcx]   ;get current index in names array

    mov edi, [r9]           ;get RVA of name
    add rdi, rbx            ;add base
    mov rsi, rdx            ;pointer to function we're looking for

compare_func:
    cmpsb
    jne wrong_func          ;function name doesn't match

    mov al, [rsi]           ;current character of our function
    test al, al             ;check for null terminator
    jz found_func           ;if at the end of our string and all matched so far, found it

    jmp compare_func        ;continue string comparison

wrong_func:
    loop for_each_func      ;try next function in array

    xor rax, rax            ;function not found in export table
    jmp done

found_func:                 ;ecx is array index where function name found

                            ;r8 points to _image_export_directory structure
    mov r9d, [r8+24h]       ;AddressOfNameOrdinals (rva)
    add r9, rbx             ;add dll base address
    mov cx, [r9+2*rcx]      ;get ordinal value from array of words

    mov r9d, [r8+1ch]       ;AddressOfFunctions (rva)
    add r9, rbx             ;add dll base address
    mov eax, [r9+rcx*4]     ;Get RVA of function using index

    cmp rax, r13            ;see if func rva falls within range of export dir
    jl not_forwarded
    cmp rax, r14            ;if r13 <= func < r14 then forwarded
    jae not_forwarded

    ;forwarded function address points to a string of the form <DLL name>.<function>
    ;note: dll name will be in uppercase
    ;extract the DLL name and add ".DLL"

    lea rsi, [rax+rbx]      ;add base address to rva to get forwarded function name
    lea rdi, [rsp+30h]      ;using register storage space on stack as a work area
    mov r12, rdi            ;save pointer to beginning of string

copy_dll_name:
    movsb
    cmp byte ptr [rsi], 2eh     ;check for '.' (period) character
    jne copy_dll_name

    movsb                               ;also copy period
    mov dword ptr [rdi], 004c4c44h      ;add "DLL" extension and null terminator

    mov rcx, r12            ;r12 points to "<DLL name>.DLL" string on stack
    call r15                ;call LoadLibraryA with target dll

    mov rcx, r12            ;target dll name
    mov rdx, rsi            ;target function name
    jmp start               ;start over with new parameters

not_forwarded:
    add rax, rbx            ;add base addr to rva to get function address
done:
    add rsp, 28h            ;clean up stack
    ret

lookup_api endp

end

Building

In the past I had developed 32-bit shellcode using the free and open-source Netwide Assembler (NASM), but when going through the exercise of learning the 64-bit variety I figured I would try it out with the Microsoft Assembler (MASM) instead.  One problem quickly became apparent: MASM offers no way (that I know of) to generate raw binary machine code as opposed to an .exe file!  All is not lost though, the code bytes can be extracted from the .exe file easily enough (but in the future I might go back to NASM).

First build a regular executable (note that no /defaultlib arguments are required – this code does not directly import any functions from DLLs because it looks them up itself):

ml64 shell64.asm /link /entry:main

Then use dumpbin to display the section headers, and take note of the virtual size and file pointer to raw data for the .text section:

dumpbin /headers shell64.exe

SECTION HEADER #1
   .text name
     1B2 virtual size
    1000 virtual address (0000000140001000 to 00000001400011B1)
     200 size of raw data
     200 file pointer to raw data (00000200 to 000003FF)
   [...]

Converting these numbers to decimal, this means we need to extract 434 (0x1b2) bytes beginning at offset 512 (0x200) in the file. This can be done with a hex editor, or with the following command if you have a Windows version of dd laying around (I’m using Cygwin):

dd if=shell64.exe of=shell64.bin bs=1 count=434 skip=512

Now we have a file shell64.bin containing our shellcode.  I like to open it in IDA Pro the first time and make sure it looks right.

Testing

The following test program simply loads data from a file into memory and then transfers execution to it. It supports an optional argument -d which will insert a debugger breakpoint prior to calling the shellcode. All of the error-handling code is long and tedious, yes, but debugging shellcode can be difficult enough without having to worry about whether the test program is working correctly. There is also a free tool called testival available for testing shellcode, which supposedly has some nice features but I have not personally tried it.

Note the call to VirtualProtect() to enable execute permission on the allocated memory. This is necessary because the process heap memory is non-executable by default on 64-bit Windows. This is called Data Execution Prevention (DEP) and was designed specifically as a security measure. Without the VirtualProtect() call, the program will crash with an Access Violation on the first instruction of the shellcode (debugging note: the !vprot command in WinDbg can be used to display the memory permissions for a given address). Bypassing DEP involves a technique called Return-Oriented Programming (ROP) which is beyond the scope of this article (see mitigations section at the end).

Also note the use of compiler intrinsics to insert the debugger breakpoint. Inline assembly language is not allowed by the x64 Visual C++ compiler, so we can no longer write __asm int 3 to trigger a debugger as in x86 and must use the __debugbreak() macro instead (it produces the same int 3 opcode).  Take a look through intrin.h – there are numerous such macros available.

//runbin.c

#include <windows.h>
#include <stdio.h>
#include <io.h>
#include <stdlib.h>
#include <malloc.h>
#include <fcntl.h>
#include <intrin.h>

typedef void (*FUNCPTR)(); 

int main(int argc, char **argv)
{
    FUNCPTR func;
    void *buf;
    int fd, len;
    int debug;
    char *filename;
    DWORD oldProtect;

    if (argc == 3 && strlen(argv[1]) == 2 && strncmp(argv[1], "-d", 2) == 0) {
        debug = 1;
        filename = argv[2];
    } else if (argc == 2) {
        debug = 0;
        filename = argv[1];
    } else {
        fprintf(stderr, "usage: runbin [-d] <filename>\n");
        fprintf(stderr, "  -d    insert debugger breakpoint\n");
        return 1;
    }

    fd = _open(filename, _O_RDONLY | _O_BINARY);

    if (-1 == fd) {
        perror("Error opening file");
        return 1;
    }

    len = _filelength(fd);

    if (-1 == len) {
        perror("Error getting file size");
        return 1;
    }

    buf = malloc(len);

    if (NULL == buf) {
        perror("Error allocating memory");
        return 1;
    }

    if (0 == VirtualProtect(buf, len, PAGE_EXECUTE_READWRITE, &oldProtect)) {
        fprintf(stderr, "Error setting memory executable: error code %d\n", GetLastError());
        return 1;
    }        

    if (len != _read(fd, buf, len)) {
        perror("error reading from file");
        return 1;
    }

    func = (FUNCPTR)buf;

    if (debug) {
        __debugbreak();
    }

    func();

    return 0;
}

Build the test program with:

cl runbin.c

Then test the shellcode as follows:

runbin shell64.bin

If all goes well the message box should be seen:

HelloWorld

If you want to step through it in a debugger, add the –d option:

runbin –d shell64.bin

For this to work, a Just-In-Time (JIT) debugger (also known as postmortem debugger) must be configured on the system. To enable WinDbg as the JIT debugger, run windbg –I from the command line.  For more information see Configuring Automatic Debugging.

Comments

This shellcode was written from scratch with the goal of making it easy to understand (as much as shellcode can be anyway) and to demonstrate how everything works. It is not the smallest or most optimized code possible. There are many other published shellcode examples out there, and the Metasploit source code is particularly worth a look (the path is /external/source/shellcode/windows/x64/src/).

  • Most shellcode does not handle forwarded exports as in this example, because it bloats and complicates the code and can be worked around by determining in advance if the function is forwarded and just writing your code to call the ultimate target instead.  (The only catch is that whether an export is forwarded can change between operating system versions or even service packs, so supporting forwarded exports does in fact make the shellcode more portable.)
  • A common variation on the technique for locating a function is to iterate through the export table computing a “hash” of each function name, and then comparing it to a pre-computed hash value of the name of the function we’re interested in. This has the advantage of making the shellcode smaller, particularly if it uses many API functions with lengthy names, as the code only needs to contain short hash values rather than full strings like “ExitProcess”. The technique also serves to obscure which functions are being called and has even been used by stand-alone malicious executables for this purpose.  Metasploit goes even further and computes a single hash that covers both the function name and DLL name.
  • It is also common practice to “encrypt” or “encode” the shellcode (typically with just a simple XOR type of algorithm rather than true strong encryption), for the purpose of obfuscation and/or avoiding particular byte values in the code (such as zeroes) that could prevent an exploit from working. The encrypted code is then prepended with a “decoder” stub that decrypts and executes the main code.
  • Most shellcode does not bother with the error handling I put in place to return zero if the DLL or function cannot be found, again because it makes the code larger and is not necessary once everything is tested.
  • The lookup_api function does not entirely behave itself according to the x64 calling conventions – in particular it does not bother to save and restore all of the registers that are deemed non-volatile.  (A function is allowed to modify rax, rcx, rdx, r8, r9, r10, and r11, but should preserve the values of all others).  It also makes an assumption that r15 will point to LoadLibraryA if needed for forwarded functions.
  • Metasploit and others use NASM instead of MASM as the assembler (probably a good call given the aforementioned limitation of MASM for outputting raw binary, also NASM is open source and runs on Linux and other platforms).
  • Metasploit uses decimal numbers for the various offsets into the data structures whereas I prefer hex (“You might be a geek if…”).

Mitigations

Unfortunately for exploit developers and fortunately for PC users, the latest versions of Windows employ a variety of effective exploit mitigation technologies.  None of these features truly eliminate vulnerabilities but they can make it significantly more difficult to execute arbitrary code via an exploit as opposed to simply crashing the program. For more information on many of these mitigations and techniques for bypassing them, the Corelan exploit writing tutorials are excellent (32-bit centric but still mostly applicable to x64).

  • Data Execution Prevention (DEP) – This was discussed earlier regarding the VirtualProtect() call in the test program.  By default the stack and heap are configured to use non-executable memory pages which trigger an Access Violation if code attempts to execute there.  DEP can be bypassed using Return-Oriented Programming (ROP), where snippets of existing executable code on the system are executed in sequence to accomplish a particular task.
  • Address Space Layout Randomization (ASLR) – Rather than loading DLLs and EXEs at constant base addresses, the operating system randomly varies the load address (at least across reboots, not necessarily between every invocation of a program).  ASLR does not prevent shellcode from executing (this example code runs just fine with it), but it makes it more difficult to transfer execution to the shellcode in the first place.  It also makes bypassing DEP using ROP much more difficult.  There are several approaches to bypassing ASLR, including the use of a secondary information-disclosure vulnerability to obtain the base address of a module.
  • Stack cookies – Compiler-generated code is inserted before and after functions to detect if the return address on the stack has been overwritten, making it more difficult to exploit stack-based buffer overflow vulnerabilities.
  • Structured Exception Handler (SEH) overwrite protection – this is not applicable to x64 because exception handlers are not stored on the stack.
  • Export Address Table Filtering (EAF) – This is a new option released as part of the Enhanced Mitigation Experience Toolkit (EMET) in November 2010. It is designed to block shellcode from looking up API addresses by accessing DLL export tables, and works by setting a hardware breakpoint on memory access to certain data structures.  Microsoft acknowledges that it can be easily bypassed but argues that it will break almost all shellcode currently in use today, and that EMET can be updated in response to new attack techniques at much more frequent intervals than new releases of Windows are possible.  See this article on bypassing EAF for details.

Comments are closed.