Windows Assembly Languages

Contents

Introduction

When developing in a new language and/or for a new platform (or returning to the language/platform after a long absence), it can sometimes take a while just to know where to begin.  By convention (with a nod to Kernighan & Ritchie), the usual first step is to write a trivial program that simply displays the message “Hello world”, and confirm that you can successfully build and execute it.

The following examples include such Hello World programs for Windows development using assembly language and .Net Intermediate Language, languages not typically used by application developers but sometimes required for security researchers. All of these examples are GUI applications that use the MessageBox() API function to display the message in a pop-up box, as opposed to console-mode (DOS-window) applications.

Hello World Message Box

Command-line build instructions are provided. The PlatformWindows SDK must be installed as well as Microsoft Visual C++ (the free Express Edition is fine), because the latest SDK does not include all of the tools needed.

.Net Intermediate Language (IL)

All .Net languages ultimately compile to a terse but human-readable language called Intermediate Language (IL), formally known as Common Intermediate Language (CIL), and formerly-formally known as Microsoft Intermediate Language (MSIL). Intermediate Language is then assembled into “bytecode” which can be executed by the .Net runtime.

It is also possible for adventurous souls to write programs directly in IL, and there are some things that can be done in IL that have no equivalent in C# or other languages. Malware written in IL has been observed, and therefore it may be necessary for security analysts to be able to understand IL code. The IL Disassembler tool included with the Windows SDK can be used to view the IL from an existing .Net executable or library. Here is a minimal version from scratch:

//hello.il

.assembly extern mscorlib {}
.assembly extern System.Windows.Forms {
    .publickeytoken = (B7 7A 5C 56 19 34 E0 89)
    .ver 4:0:0:0
}

.assembly HelloIL {}

.namespace HelloIL
{
    .class Hello
    {
        .method static public void Main() cil managed
        {
            .entrypoint
            .maxstack 8
            ldstr "Hello world!"
            ldstr "Message"
            call valuetype [System.Windows.Forms]System.Windows.Forms.DialogResult [System.Windows.Forms]System.Windows.Forms.MessageBox::Show(string,string)
            pop
            ret
        }
    }
}

Build with:

ilasm hello.il

A thorough explanation of this code is beyond the scope of this article, but note the following:

  • Whichever method contains the .entrypoint directive is considered the program entry point, it is not required to be called Main() or have a particular method signature (or even be inside a class).
  • Lines 3-7 declare the required .Net libraries and version information (we can get away with not specifying a version of mscorlib, the core .net library)
  • Every type or method from an external library must be prefixed with the assembly name in square brackets, e.g. [System.Windows.Forms]
  • When calling a method, parameters are loaded onto the stack left-to-right and the return value popped off.

win32 API – x86 assembly

The main difference in using assembly language compared to C is that you need to understand the function calling conventions (the way parameters and return values are passed).  The 32-bit Windows API uses the stdcall convention which means that parameters are passed on the stack from right-to-left, and the return value comes back in the eax register (we ignore the return value here, but if we cared about it we could find it in eax).

;hello32.asm

.386
.model flat

MessageBoxA proto stdcall hwnd:dword, text:dword, caption:dword, buttons:dword
ExitProcess proto stdcall exitcode:dword

.data

text        db  'Hello world!', 0
caption     db  'Message', 0

.code

main proc
    push 0                  ;MB_OK
    push offset caption     ;lpCaption
    push offset text        ;lpText
    push 0                  ;hWnd
    call MessageBoxA
    push 0
    call ExitProcess
main endp

end

Ensure that the x86 build environment is selected (may need to run setenv.bat /x86) then build with:

ml hello32.asm /link /entry:main /subsystem:windows /defaultlib:kernel32.lib /defaultlib:user32.lib

There are a few caveats around assembling and linking this code:

  • The .386 and .model directives must come first in the file, even before the API function prototypes, otherwise the assembler may generate 16-bit relocations in the resulting object file.  The linker then chokes on these with the cryptic message “fatal error LNK1190: invalid fixup found, type 0x0002” (which is also poorly documented in MSDN).
  • It seems that you have to specify the full parameter list in the prototypes and can’t get away with just declaring “extrn MessageBoxA”, because the 32-bit import libraries (e.g. user32.lib) export “mangled” names like _MessageBoxA@16, where the numeric suffix corresponds to the number of bytes of stack space used for parameters (4 parameters x 4 bytes each = 16).  Without knowing the number of parameters the linker is unable to locate the function and an “LNK1120: unresolved externals” error occurs.
  • Running the dumpbin utility (included with Visual C++) against the .obj and .lib files is helpful for diagnosing these types of issues, particularly with the /exports, /symbols, and /relocations switches.  Also try the /verbose linker option (add to the ml command line after /link).  Matt Pietrek’s linkers article is worth reading to understand how it all works.

win32 API – x64 assembly

There are a variety of other terms in use for what Microsoft calls “x64” including amd64, Intel 64, and x86-64. x64 is similar to the 32-bit x86 architecture and not to be confused with IA64, which refers to the vastly different Itanium processor architecture. The main differences in assembly programming are the use of new 64-bit registers and the fact that x64 editions of Windows use a different calling convention for the Windows API.

The calling convention is based on the x86 fastcall and involves passing the first four parameters in registers rather than on the stack. Left-to-right, the parameters are passed using rcx, rdx, r8, and r9, and the caller must still “reserve” space on the stack even for the parameters passed in registers. There is a lot more involved, including stack alignment issues and different registers used for non-integer parameters.  For the full scoop refer to the x64 Software Conventions on MSDN.

;hello64.asm

extrn MessageBoxA : proc
extrn ExitProcess : proc

.data

text        db  'Hello world!', 0
caption     db  'Message', 0

.code

main proc frame
    sub rsp, 28h
    .allocstack 28h
    .endprolog
    xor r9, r9          ;MB_OK
    lea r8, caption     ;lpCaption
    lea rdx, text       ;lpText
    xor rcx, rcx        ;hWnd
    call MessageBoxA
    xor rcx, rcx
    call ExitProcess
    add rsp, 28h
main endp

end

Ensure that the x64 build environment is selected (setenv.bat /x64) and build using the 64-bit assembler:

ml64 hello64.asm /link /subsystem:windows /defaultlib:kernel32.lib /defaultlib:user32.lib /entry:main

Note that compared to the 32-bit code:

  • The .386 and .model directives are not used (or allowed).
  • The windows API functions can simply be declared as extrn without prototypes (“name mangling” is not used in the x64 import libraries).
  • The frame directive after proc, along with the .allocstack and .endprolog directives are new for x64.  These directives are used to declare the function unwind information which describes the function prologue for debugging and exception handling purposes.  The program will still build and run if these directives are omitted, but according to the x64 conventions every non-leaf function (a function which calls other functions) is supposed to have unwind information declared.  This information is not part of the executable code but rather is stored in the .pdata section of the executable and can be viewed with dumpbin /unwindinfo. For more details see Unwind Helpers for MASM in the x64 Software Conventions.
  • All non-leaf functions must reserve enough stack space to accommodate the maximum number of parameters of any function it calls, with a minimum of 4 parameters (even if all called functions take less than 4), and may need to round up to align the stack on a 16-byte memory boundary. Refer to  Stack Usage in the x64 Software Conventions documentation for details. This is important: if insufficient stack space is reserved or alignment is not maintained, the program may crash. In this case 28h (40 decimal) bytes are reserved: 32 bytes for the 4 parameters plus an extra 8 bytes for alignment.
  • The stack needs to be adjusted at the end of the function by the same amount reserved in the beginning. The final instruction in main (add rsp, 28h) is shown for illustration purposes though it of course will never be executed because ExitProcess is called.
  • It does not matter in what order the parameter values are assigned to registers before the call (as long as the correct register is used for each parameter), but it is common practice to assign them in right-to-left order (as seen in this code) because programmers are so accustomed to seeing it that way in 32-bit code with the old convention.

Comments are closed.