Heap Internals — How glibc malloc Works
Thilan Dissanayaka Exploit Development April 10, 2020

Heap Internals — How glibc malloc Works

So far in this series, we’ve exploited the stack — buffer overflows, ROP chains, format strings. The stack is predictable: local variables go in, function returns pop them out, everything follows a strict LIFO order.

The heap is a completely different beast. It’s dynamic memory — malloc() and free(). Programs request memory of arbitrary sizes at arbitrary times and free it in any order. The heap allocator has to manage this chaos efficiently, and the bookkeeping metadata it uses to track allocations is exactly what attackers target.

Heap exploitation is how most modern exploits work. Browser exploits, kernel exploits, server exploits — the majority involve heap corruption. The stack is too well-protected (canaries, ASLR, DEP). The heap is where the action is.

But before we can exploit the heap, we need to understand how it works. This article covers the internals of glibc’s malloc (ptmalloc2) — the default allocator on Linux. The next article covers the actual exploitation techniques.

Stack vs Heap — The Fundamental Difference

Stack:
- Fixed size (usually 8MB per thread)
- Grows/shrinks automatically with function calls
- LIFO order — last allocated, first freed
- Managed by the compiler (push/pop)
- Local variables, function arguments, return addresses

Heap:
- Dynamic size (grows as needed)
- Allocated/freed explicitly by the programmer
- No ordering — free in any order
- Managed by the allocator (malloc/free)
- Dynamically sized data, objects, buffers
void example() {
    int x = 42;                    // Stack — automatic, freed when function returns
    char *buf = malloc(256);       // Heap — manual, stays until free() is called
    free(buf);                     // Programmer must free explicitly
}

The stack is simple. The heap is complex. And complexity breeds vulnerabilities.

malloc() and free() — The Interface

The programmer sees a simple API:

void *malloc(size_t size);    // Allocate 'size' bytes, return pointer
void free(void *ptr);          // Free previously allocated memory
void *realloc(void *ptr, size_t size);  // Resize allocation
void *calloc(size_t nmemb, size_t size);  // Allocate and zero-initialize

Behind this simple interface, glibc’s malloc maintains a sophisticated system of chunks, bins, and arenas to manage memory efficiently.

Chunks — The Building Block

Every malloc() allocation is wrapped in a chunk. The chunk contains metadata (header) and the user data.

Allocated Chunk Layout

    chunk pointer (returned by malloc - 8 bytes)
    |
    v
┌────────────────────────────────────────┐
│ prev_size (4/8 bytes)                   │  ← Size of previous chunk (if free)
├────────────────────────────────────────┤
│ size (4/8 bytes)     | A | M | P       │  ← Size of this chunk + flags
├────────────────────────────────────────┤  ← Pointer returned to user
│                                         │
│ User Data                               │
│                                         │
├────────────────────────────────────────┤
│ (padding to 16-byte alignment)          │
└────────────────────────────────────────┘

The pointer returned by malloc() points to the user data, not the chunk header. The header sits 8 bytes (32-bit) or 16 bytes (64-bit) before the returned pointer.

The Size Field and Flag Bits

The size field stores the chunk’s total size, but the bottom 3 bits are repurposed as flags (because chunks are always 8-byte aligned, so the bottom 3 bits of the size would always be zero):

Bit Name Meaning
Bit 0 P (PREV_INUSE) Previous chunk is in use (not free)
Bit 1 M (IS_MMAPPED) Chunk was allocated via mmap()
Bit 2 A (NON_MAIN_ARENA) Chunk belongs to a non-main arena (threading)

To get the actual size: real_size = size & ~0x7 (mask off the bottom 3 bits).

The PREV_INUSE bit is critical for exploitation — it tells the allocator whether the chunk before this one is free (and thus can be coalesced).

Free Chunk Layout

When a chunk is freed, the user data area is repurposed to store free list pointers:

┌────────────────────────────────────────┐
│ prev_size                               │
├────────────────────────────────────────┤
│ size                      | A | M | P  │
├────────────────────────────────────────┤
│ fd (forward pointer)                    │  ← Points to next free chunk
├────────────────────────────────────────┤
│ bk (backward pointer)                   │  ← Points to previous free chunk
├────────────────────────────────────────┤
│ (remaining space unused)                │
│                                         │
└────────────────────────────────────────┘

The fd and bk pointers form a doubly-linked list of free chunks. This is where the allocator finds memory to satisfy future malloc() calls.

These pointers occupy the same memory that was previously user data. The allocator knows it’s safe to write here because the chunk is free — nobody should be reading this memory anymore. (Spoiler: use-after-free exploits violate this assumption.)

Visualizing with GDB

Let’s see real chunks in memory:

#include <stdlib.h>
#include <string.h>

int main() {
    char *a = malloc(32);
    char *b = malloc(32);
    char *c = malloc(32);

    strcpy(a, "AAAA");
    strcpy(b, "BBBB");
    strcpy(c, "CCCC");

    free(b);  // Free the middle chunk

    return 0;
}
$ gcc -m32 -g -o heap_demo heap_demo.c
$ gdb ./heap_demo
(gdb) break 12   # After free(b)
(gdb) run

Examining chunk A (allocated):

(gdb) x/8wx (char*)a - 8
0x804b150: 0x00000000  0x00000029  0x41414141  0x00000000
           prev_size   size(40+1)  "AAAA"      (padding)

Size is 0x29 = 41. The bottom bit (PREV_INUSE) is set, meaning the chunk before A is in use. The actual chunk size is 0x28 = 40 bytes (8 header + 32 user data).

Examining chunk B (freed):

(gdb) x/8wx (char*)b - 8
0x804b178: 0x00000000  0x00000029  0x0804b1a0  0xf7fb87b0
           prev_size   size(40+1)  fd           bk

B is free. The user data area now contains fd and bk — pointers into the free list. The allocator will use these to find B when a future malloc(32) is requested.

Bins — Where Free Chunks Live

When you free() a chunk, it doesn’t go back to the OS immediately. It goes into a bin — a list of free chunks organized by size. When malloc() is called, the allocator checks the appropriate bin first.

glibc has several types of bins:

Fast Bins

Fast bins are singly-linked lists (only fd pointer, no bk) for small chunks (16 to 80 bytes on 32-bit, 32 to 160 bytes on 64-bit). They’re LIFO — last freed, first allocated.

Fastbin[0]:  chunk_c → chunk_b → chunk_a → NULL
             (size 16)

Fastbin[1]:  chunk_f → chunk_e → NULL
             (size 24)

...up to 10 fastbins

Key properties:

  • No coalescing — adjacent free chunks in fast bins are NOT merged together
  • LIFO orderfree(a); free(b); malloc(same_size) returns b (last freed, first out)
  • Singly linked — only fd, no bk
  • PREV_INUSE bit is not cleared — the next chunk doesn’t know this chunk is free

Fast bins exist for performance. Small, frequent allocations (the most common case) are served extremely quickly.

Unsorted Bin

When a non-fastbin chunk is freed, it first goes into the unsorted bin — a single doubly-linked list that acts as a staging area.

Unsorted bin:  chunk_x ⟷ chunk_y ⟷ chunk_z

When malloc() can’t find a fast bin match, it searches the unsorted bin. Each chunk in the unsorted bin is either:

  • Returned directly if it’s the right size
  • Sorted into the appropriate small bin or large bin

Small Bins

Small bins hold chunks of a specific size. There are 62 small bins, each for a different chunk size (16, 24, 32, …, 504 bytes on 32-bit).

Smallbin[0]:  chunks of size 16
Smallbin[1]:  chunks of size 24
...
Smallbin[61]: chunks of size 504

Small bins are doubly-linked (FIFO). They’re exact-fit — malloc(24) only checks the smallbin for size 24.

Large Bins

Large bins hold chunks of a range of sizes. There are 63 large bins, and each covers an increasingly wide range:

Largebin[0]:  512-568 bytes
Largebin[1]:  576-632 bytes
...
Largebin[62]: very large chunks

Large bins are sorted within each bin (largest first), so the allocator can find the best fit.

Bin Summary

Bin Type Size Range (32-bit) Structure Order Coalescing
Fast bins 16-80 bytes Singly-linked LIFO No
Unsorted bin Any (staging) Doubly-linked Insertion order Yes
Small bins 16-504 bytes Doubly-linked FIFO Yes
Large bins 512+ bytes Doubly-linked + sorted Best fit Yes

Tcache — The Modern Fast Path

Starting with glibc 2.26 (2017), a new layer was added: tcache (thread-local cache). Tcache is a per-thread cache of recently freed chunks that sits in front of all other bins.

Each thread has:
  tcache_entry[0]:  chunks of size 24  → chunk → chunk → NULL  (max 7)
  tcache_entry[1]:  chunks of size 40  → chunk → chunk → NULL  (max 7)
  ...
  tcache_entry[63]: chunks of size 1032 → ...

Key properties:

  • 64 bins, each holding up to 7 chunks
  • Singly-linked (like fast bins)
  • LIFO order
  • No security checks in early versions (glibc < 2.32) — this made tcache extremely easy to exploit
  • Checked before any other bin

When you call free():

  1. If tcache has room for this size → put it in tcache
  2. Otherwise → fall through to normal bins (fastbins, unsorted, etc.)

When you call malloc():

  1. Check tcache for this size → return if found
  2. Otherwise → check fastbins, unsorted bin, small/large bins, etc.

Tcache was designed for performance (avoiding lock contention in multithreaded programs), but its lack of security checks in early versions made it a goldmine for heap exploitation.

Coalescing — Merging Free Chunks

When a chunk is freed (and it’s not a fast bin chunk), the allocator checks whether the adjacent chunks are also free. If so, it merges (coalesces) them into a single larger chunk.

Before free(B):
┌──────────┬──────────┬──────────┐
│  A (used) │ B (used) │ C (free) │
└──────────┴──────────┴──────────┘

After free(B):  B is merged with C
┌──────────┬─────────────────────┐
│  A (used) │    B+C (free)       │
└──────────┴─────────────────────┘

Coalescing prevents fragmentation — without it, the heap would fill up with tiny unusable gaps.

How Coalescing Works

When freeing chunk B, the allocator:

  1. Check forward — Is the chunk after B free? Look at the chunk after B’s PREV_INUSE bit.
  2. Check backward — Is the chunk before B free? Check B’s own PREV_INUSE bit. If it’s clear (0), the previous chunk is free, and B’s prev_size field tells us its size.
  3. Merge — Combine adjacent free chunks into one larger chunk.

The prev_size field and PREV_INUSE bit are critical for this process — and they’re exactly what heap exploits manipulate.

The Top Chunk (Wilderness)

At the top of the heap sits the top chunk (also called the wilderness). It’s the last chunk in the heap, bordering unmapped memory.

When no bin has a suitable free chunk, the allocator carves off a piece of the top chunk:

Before malloc(100):
┌─────────────┬──────────────────────────────────┐
│  used chunks │        Top Chunk (large)          │
└─────────────┴──────────────────────────────────┘

After malloc(100):
┌─────────────┬────────────┬─────────────────────┐
│  used chunks │ new (100)  │  Top Chunk (smaller) │
└─────────────┴────────────┴─────────────────────┘

If the top chunk is too small, the allocator extends the heap using sbrk() or mmap().

The Arena

An arena is a complete set of bins plus a mutex for thread safety. The main thread uses the main arena. Additional threads may get their own arenas to avoid lock contention.

Main Arena:
  - Fast bins [0..9]
  - Unsorted bin
  - Small bins [0..61]
  - Large bins [0..62]
  - Top chunk
  - Mutex

In multithreaded programs, thread contention on the arena mutex can be a bottleneck — which is why tcache (per-thread, no locking) was introduced.

Heap Inspection Tools

GDB with pwndbg/GEF

The most powerful way to inspect heap state:

$ gdb ./heap_demo

# With pwndbg:
pwndbg> heap          # Show all chunks
pwndbg> bins          # Show all bins (fast, tcache, unsorted, small, large)
pwndbg> vis_heap_chunks  # Visual representation of chunks

# With GEF:
gef> heap chunks      # List all chunks
gef> heap bins        # Show bins

Example bins output:

pwndbg> bins
tcachebins
0x30 [  2]: 0x555555559310 —▸ 0x5555555592e0 ◂— 0x0
fastbins
empty
unsortedbin
all: 0x555555559370 —▸ 0x7ffff7fb0be0 (main_arena+96) ◂— 0x555555559370
smallbins
empty
largebins
empty

ltrace

Trace malloc/free calls:

$ ltrace ./heap_demo
malloc(32)                           = 0x5655a1a0
malloc(32)                           = 0x5655a1c8
malloc(32)                           = 0x5655a1f0
free(0x5655a1c8)                     = <void>

Security Checks in Modern glibc

Over the years, glibc has added numerous security checks to detect heap corruption:

Check What It Detects Added In
Double-free detection (fastbin) Freeing the same chunk twice in a row glibc 2.x (early)
Unlink safe check Corrupted fd/bk pointers during coalescing glibc 2.3.4
Fastbin size check Corrupted size field in fast bin chunk glibc 2.4
Top chunk size check Corrupted top chunk size glibc 2.x
Tcache double-free (key check) Double-free in tcache glibc 2.29
Tcache count check Tcache bin exceeding 7 entries glibc 2.30
Safe-linking Pointer mangling in fast bins and tcache glibc 2.32

Safe-linking (glibc 2.32+) is the most significant recent addition. It XORs free list pointers with a random value derived from their own address, making it much harder to forge pointers:

// Instead of storing raw pointer:
entry->fd = raw_pointer

// Safe-linking stores:
entry->fd = raw_pointer XOR ((&entry->fd) >> 12)

An attacker who doesn’t know the heap address can’t forge valid safe-linked pointers. However, if they can leak a heap address (or a safe-linked pointer), they can recover the XOR key and bypass the protection.

Summary — What Matters for Exploitation

The key concepts for heap exploitation:

  1. Chunk metadata (size, PREV_INUSE, fd, bk) sits right next to user data — a heap overflow can corrupt it
  2. Free lists (tcache, fastbins, bins) use pointers stored in freed chunks — controlling these pointers means controlling where malloc() returns
  3. Coalescing trusts prev_size and PREV_INUSE — corrupting these can cause overlapping chunks
  4. Tcache has fewer checks than other bins — it’s the easiest exploitation target on modern glibc
  5. The allocator reuses memoryfree(ptr) doesn’t zero the memory, and the next malloc() of the same size returns the same address

In the next article, we’ll exploit every one of these properties.

Happy reversing!

ALSO READ
Writing a Shell Code for Linux
Apr 21, 2020 Exploit Development

Shellcode is a small piece of machine code used as the payload in exploit development. In this post, we write Linux shellcode from scratch — starting with a simple exit, building up to spawning a shell, and explaining every decision along the way.

Exploiting a Stack Buffer Overflow on Windows
Apr 12, 2020 Exploit Development

In a previous tutorial we discusses how we can exploit a buffer overflow vulnerability on a Linux machine. I wen through all theories in depth and explained each step. Now today we are going to jump...

Exploiting a  Stack Buffer Overflow  on Linux
Apr 01, 2020 Exploit Development

Have you ever wondered how attackers gain control over remote servers? How do they just run some exploit and compromise a computer? If we dive into the actual context, there is no magic happening....

Basic concepts of Cryptography
Mar 01, 2020 Cryptography

Ever notice that little padlock icon in your browser's address bar? That's cryptography working silently in the background, protecting everything you do online. Whether you're sending an email,...

Common Web Application Attacks
Feb 05, 2020 Application Security

Web applications are one of the most targeted surfaces by attackers. This is primarily because they are accessible over the internet, making them exposed and potentially vulnerable. Since these...

Remote Code Execution (RCE)
Jan 02, 2020 Application Security

Remote Code Execution (RCE) is the holy grail of application security vulnerabilities. It allows an attacker to execute arbitrary code on a remote server — and the consequences are as bad as it sounds. In this post, we'll go deep into RCE across multiple languages, including PHP, Java, Python, and Node.js.

> > >