Understanding the Heap Internals
Thilan Dissanayaka Exploit development Apr 12, 2026

Understanding the Heap Internals

So far in this series, we've exploited the stack buffer overflows, ROP chains, format strings. The stack is predictable: local variables go in, function returns pop them out, everything follows a strict LIFO order.

The heap is a completely different beast. It's dynamic memory malloc() and free(). Programs request memory of arbitrary sizes at arbitrary times and free it in any order. The heap allocator has to manage this chaos efficiently, and the bookkeeping metadata it uses to track allocations is exactly what attackers target.

Heap exploitation is how most modern exploits work. Browser exploits, kernel exploits, server exploits the majority involve heap corruption. The stack is too well-protected (canaries, ASLR, DEP). The heap is where the action is.

But before we can exploit the heap, we need to understand how it works. This article covers the internals of glibc's malloc (ptmalloc2) — the default allocator on Linux. The next article covers the actual exploitation techniques.

I know. This going to be a very lengthy article, BTW I did not want to split this into multiple posts. Here you can find the whole complete idea about the heap you need for the Heap based buffer overflows.

Stack vs Heap — The Fundamental Difference

Stack:.

  • Fixed size (usually 8MB per thread)
  • Grows/shrinks automatically with function calls
  • LIFO order — last allocated, first freed
  • Managed by the compiler (push/pop)
  • Local variables, function arguments, return addresses

Heap:

  • Dynamic size (grows as needed)
  • Allocated/freed explicitly by the programmer
  • No ordering — free in any order
  • Managed by the allocator (malloc/free)
  • Dynamically sized data, objects, buffers

To understand the differences lets consider a simple C function.

void memoryAllocation() {
    int x = 42;                    // Stack — automatic, freed when function returns

    char *buf = malloc(256);       // Heap — manual, stays until free() is called
    free(buf);                     // Programmer must free explicitly
}

The stack is simple. The heap is complex. And complexity breeds vulnerabilities.

malloc() and free() — The Interface

The programmer sees a simple API:

void *malloc(size_t size);                 // Allocate 'size' bytes, return pointer
void free(void *ptr);                      // Free previously allocated memory
void *realloc(void *ptr, size_t size);     // Resize allocation
void *calloc(size_t nmemb, size_t size);   // Allocate and zero-initialize

Behind this simple interface, glibc's malloc maintains a sophisticated system of chunks, bins, and arenas to manage memory efficiently.

Chunks — The Building Block

The first and most important concept in this tutorial is the chunk. As the first step we are going to see what is a chunk and how it looks like.

Every malloc() allocation is wrapped in a chunk.

qyaoxeh5aevx9mplrfth.png

Each chunk contains metadata (header) and the user data.

Allocated Chunk Layout

The actual layout of a chunk is something like below.

kboadpvcurzlujze0lws.png

We know that whenever we use malloc() function we recieve a pointer. (If memory allocation was success). This pointer is always pointed to the user data. Not to the starting of the chunk. In the 32 bit architecture the meta data is 8 bytes in size. So, if we want to get the address of the starting point of the chunk we have do substract 8 from the pointer which is returned by malloc(). We need to keep this in mind.

Now lets dive deep in to the chunk and see how it looks like.

ori3jt05whlithffk8mp.png

The meta data section has two parts,

Prev size

This indicates the size of the previous chunk. Actually this does not represent the size always. I'll explain it later. For now keep in mind this represents the size of the previous chunk.

Size + Flags

As the name suggests, this indicates the size field stores the chunk's total size, but the bottom 3 bits are repurposed as flags.

Wait. How that is possible? Let me explain.

The chunk size is always a multiplication of 8. Such as 8 bytes, 16 bytes, 24 bytes etc.

$$ 8_{10} = 00001000_2 $$

$$ 16_{10} = 00010000_2 $$

$$ 24_{10} = 00011000_2 $$

$$ 32_{10} = 00100000_2 $$

So the last 3 bits of the size would always be zero. We can use these three bits as the space for the flags.

Actual Size in bytes + | A | M | P |
Bit Name Meaning
Bit 0 P (PREV_INUSE) Previous chunk is in use (not free)
Bit 1 M (IS_MMAPPED) Chunk was allocated via mmap()
Bit 2 A (NON_MAIN_ARENA) Chunk belongs to a non-main arena (threading)

The PREV_INUSE bit is critical for exploitation. It tells the allocator whether the chunk before this one is free.

Lets consider an example, We see following as the metadata for the current chunk.

 size + flags = 0x00000021

Now,

$$ 00000021_{16} = 33_{10} $$

So this does not mean the actual size is 33 bytes.

To get the actual size, we need to convert it into binary and disregard the last three bits,

$$ 33_{10} = 0100001_2 $$

After removing the last three digits it is 0100000.

$$ 0100000_2 = 32_{10} $$

In a progamaticle way we can say this in below way,

We can use the bitwise AND operator with 11111000 to get rid of the last three bits.

$$ 0100001_{2} \land 11111000_{2} $$

  • tcache (modern glibc, default)
  • fastbins
  • unsorted / small / large bins

Bins — Where Free Chunks Live

There are four types of bins.

Fast Bins

A fast bin is made of a linked list. More specifically a singly linked list. Oh, Finally an actual usage of Linked list :).

Fastbins handle chunks from 0x10 (16 bytes) to to 0x50 (80 bytes) TOTAL chunk size.

So valid fastbin chunk sizes:

0x10, 0x18, 0x20, 0x28, 0x30, 0x38, 0x40, 0x48, 0x50

And one another important thing is they're LIFO. That means last freed, first allocated. Don't worry I'll explain these clearly.

The layout of a chunk after it is moved into a fast bin is below.

kyjlfaqki6rcnm8hvri3.png

As I mentioned there is a one pointer called fd. And it is placed at the user data section. Because after using free() we are not going to use those user data. Since that we can use that space for the pointer.

So, what does this fd pointer do? So ultimately the fastbin section is a collection of linked lists (Fast bins).

Fastbin[0]:  chunk_c → chunk_b → chunk_a → NULL
             (size 16)

Fastbin[1]:  chunk_f → chunk_e → NULL
             (size 24)

...up to 10 fastbins

Key properties: - No coalescing — adjacent free chunks in fast bins are NOT merged together - LIFO order — free(a); free(b); malloc(same_size) returns b (last freed, first out) - Singly linked — only fd, no bk - PREV_INUSE bit is not cleared — the next chunk doesn't know this chunk is free Fast bins exist for performance. Small, frequent allocations (the most common case) are served extremely quickly.

Unsorted Bin

When a non-fastbin chunk is freed, it first goes into the unsorted bin — a single doubly-linked list that acts as a staging area.

When a chunk is freed, the user data area is repurposed to store free list pointers:

bbtxsyq98noda1mqa9qv.png

The fd and bk pointers form a doubly-linked list of free chunks. This is where the allocator finds memory to satisfy future malloc() calls.

These pointers occupy the same memory that was previously user data. The allocator knows it's safe to write here because the chunk is free — nobody should be reading this memory anymore. (Spoiler: use-after-free exploits violate this assumption.)

Unsorted bin:  chunk_x ⟷ chunk_y ⟷ chunk_z

When malloc() can't find a fast bin match, it searches the unsorted bin. Each chunk in the unsorted bin is either:

  • Returned directly if it's the right size
  • Sorted into the appropriate small bin or large bin

Small Bins

Small bins hold chunks of a specific size. There are 62 small bins, each for a different chunk size (16, 24, 32, ..., 504 bytes on 32-bit).

Smallbin[0]:  chunks of size 16
Smallbin[1]:  chunks of size 24
...
Smallbin[61]: chunks of size 504

Small bins are doubly-linked (FIFO). They're exact-fit — malloc(24) only checks the smallbin for size 24.

Large Bins

Large bins hold chunks of a range of sizes. There are 63 large bins, and each covers an increasingly wide range:

Largebin[0]:  512-568 bytes
Largebin[1]:  576-632 bytes
...
Largebin[62]: very large chunks

Large bins are sorted within each bin (largest first), so the allocator can find the best fit.

Tcache — The Modern Fast Path

Starting with glibc 2.26 (released in 2017), a new layer was added on top of everything we've discussed so far the thread-local cache, or tcache. If you're on a modern Linux distribution, tcache is the very first thing malloc() checks and the very first place free() puts a chunk.

The motivation is performance. Fast bins were fast, but they were still part of the main arena — accessing them required taking an arena lock in multithreaded programs. Tcache sidesteps this entirely: each thread gets its own private cache, no locks needed.

Thread 1 tcache:                Thread 2 tcache:
  [0x20] → A → B → NULL           [0x20] → X → NULL
  [0x30] → C → NULL               [0x40] → Y → Z → NULL
  [0x40] → NULL                   ...
  ...

Key properties of tcache:

  • Per-thread — no locking, extremely fast
  • Singly-linked LIFO — just like fast bins, only fd matters
  • 64 bins by default, covering chunk sizes from 24 up to 1032 bytes (on 64-bit)
  • 7 chunks per bin by default (tcache_count = 7)
  • No coalescing, no sanity checks (historically) — which is exactly why it became a favorite exploitation target

When you call free(ptr), glibc's logic roughly looks like this:

1. Is the chunk size within tcache range AND is the matching tcache bin
   not full (< 7 chunks)?  → Put it in tcache. Done.
2. Is the chunk size within fast bin range?  → Put it in fast bin. Done.
3. Is the chunk mmap'd?  → munmap it.
4. Otherwise → try to coalesce with neighbors, then put it in the unsorted bin.

And on malloc():

1. Is there a matching chunk in tcache?  → Return it immediately.
2. Check fast bins.
3. Check the small bin for exact size.
4. Walk the unsorted bin, sorting chunks as we go, returning an exact match.
5. Check large bins.
6. Carve a new chunk from the top chunk.

This is why, in our GDB demonstrations later in this article, we explicitly disable tcache with GLIBC_TUNABLES=glibc.malloc.tcache_count=0. Without that, our freed chunks would silently land in tcache and we would never see the unsorted-bin fd/bk pointers that make the internals understandable. Tcache hides all the interesting metadata behind a trivial linked list.

From an exploitation standpoint, tcache is a gift. Early versions had no double-free detection, no size checks, and no pointer mangling — which gave rise to primitives like tcache poisoning, where overwriting a freed chunk's fd pointer causes the next malloc() of that size to return an attacker-chosen address. Later glibc versions added a key field and pointer mangling (PROTECT_PTR) to harden this, but we'll get into those details in the next article.

Coalescing — Merging Free Chunks

If the allocator simply kept every freed chunk as-is, the heap would quickly turn into a graveyard of tiny, unusable fragments. You'd free() a hundred 16-byte chunks next to each other and still not be able to satisfy a single malloc(1024), even though the memory is physically contiguous. This problem is called external fragmentation, and the allocator fights it with coalescing — merging adjacent free chunks into one larger free chunk.

Coalescing happens inside free(), and only for chunks that go through the "slow path" (not fast bins and not tcache — both of those deliberately skip coalescing for speed). There are two directions:

Backward coalescing — merging with the previous chunk:

1. Read the current chunk's size field.
2. Check the PREV_INUSE bit (bit 0).
3. If it's 0, the previous chunk is free.
4. Read `prev_size` from the current chunk's header.
5. Walk backward: prev_chunk = current - prev_size.
6. Unlink the previous chunk from its bin.
7. New chunk = prev_chunk, new size = prev_size + current_size.

Forward coalescing — merging with the next chunk:

1. Compute next_chunk = current + current_size.
2. Look at the chunk AFTER next_chunk (by walking current + current_size + next_size).
3. Check that chunk's PREV_INUSE bit.
4. If it's 0, then next_chunk is free.
5. Unlink next_chunk from its bin.
6. New size = current_size + next_chunk_size.

Notice something important the allocator never stores a "this chunk is free" bit on the chunk itself. Instead, a chunk's freeness is recorded in the next chunk's PREV_INUSE bit. This is a classic space-saving trick there's no room in the header of a free chunk for more metadata (the fd/bk pointers need that space), so the information is pushed forward to the neighbor.

This tiny detail is the root of an entire class of exploits. If an attacker can forge a fake chunk header with PREV_INUSE = 0 and a controlled prev_size, free() will happily "walk backward" to an attacker-chosen address and treat whatever it finds there as a free chunk. The unlink that follows becomes a write-what-where primitive. We will see this in the next post on heap exploitation — for now, just remember: coalescing trusts metadata completely.

After coalescing, the resulting merged chunk is placed into the unsorted bin (not back into its original bin), where a future malloc() call will either reuse it directly or sort it into a small/large bin.

The Top Chunk (Wilderness)

At the very end of the heap sits a special chunk called the top chunk, also known affectionately as the wilderness. It represents all the unallocated memory at the top of the heap region — everything the program has requested from the OS but not yet carved up into smaller chunks.

You've already seen the top chunk in action in the GDB examples earlier. Remember this line?

0x5655a1b8: 0x00000000  0x00021e49  0x00000000  0x00000000

That 0x00021e49 is the top chunk's size field. It's huge (about 138 KB) because it's essentially "the rest of the heap." Each time a malloc() call can't find a suitable chunk in any bin, the allocator falls back to the top chunk it carves a new chunk off the front of the wilderness and shrinks the top chunk by that amount. That's why, between each malloc() in our earlier example, we saw the top chunk's size shrink from 0x21e490x21e390x21e29.

The top chunk has a few special properties:

  • It's never in any bin. The allocator keeps a direct pointer to it in the arena structure (av->top).
  • It has no neighbor to its right. Its PREV_INUSE bit is conceptually always set, and its "next chunk" doesn't exist — attempting to read it would walk off the end of the heap.
  • It absorbs adjacent frees. When you free a chunk that sits immediately before the top chunk, it doesn't go into a bin — it's merged directly into the wilderness. We saw this at the end of the earlier example, when freeing chunk C caused the top chunk to jump backward and swallow both B and C.
  • It grows the heap when exhausted. If the top chunk is too small to satisfy a request, the allocator calls sbrk() (for the main heap) or mmap() (for very large requests or secondary arenas) to extend it.
  • It can shrink. If the top chunk grows beyond M_TRIM_THRESHOLD (default 128 KB), glibc trims the excess back to the OS with a negative sbrk(). This is exactly the "heap trimming" we observed at the very end of the GDB walkthrough, where 0x21e49 became 0x20e49 after glibc returned a 4 KB page to the kernel.

From an attacker's perspective, the top chunk is interesting because its size field is just another 8-byte value sitting in memory. If a heap buffer overflow lets you overwrite it, you can forge a gigantic size, and then request a huge malloc() that makes the allocator's internal pointer "wrap around" to an arbitrary address. This is the classic House of Force technique. Again — a story for the next article.

The Arena

Finally, the wrapper that ties all of this together: the arena. An arena is glibc's internal bookkeeping structure for a heap. It holds every piece of state we've discussed so far — all the bins, the top chunk pointer, a lock, and references to the underlying memory regions.

Here's a simplified view of struct malloc_state (the arena):

struct malloc_state {
    mutex_t      mutex;            // Lock for thread safety
    int          flags;
    mfastbinptr  fastbinsY[10];    // Fast bins
    mchunkptr    top;              // The top chunk
    mchunkptr    last_remainder;   // Last split remainder
    mchunkptr    bins[254];        // Unsorted + small + large bins
    unsigned int binmap[4];        // Bitmap of non-empty bins
    struct malloc_state *next;     // Linked list of arenas
    ...
};

Every Linux process has at least one arena: the main arena, which lives as a static global inside libc itself. That's why, in our earlier GDB session, chunk B's fd and bk pointers after free(b) both pointed to 0xf7faf798 — a fixed address inside libc.so. That's the main arena's unsorted bin head.

Why multiple arenas?

In a single-threaded program, the main arena is all you ever need. But in a multithreaded program, every malloc() and free() would need to take the arena lock, serializing every thread on a single mutex. That would kill performance.

So glibc creates additional arenas on demand. When a thread calls malloc() and finds the main arena locked, glibc may spin up a new arena (up to 8 * number_of_cores by default) and assign that thread to it. Each extra arena has its own heap region, allocated via mmap() rather than sbrk(), and its own independent set of bins.

Main arena  →  sbrk() heap     (thread 1)
Arena 2     →  mmap'd region   (thread 2)
Arena 3     →  mmap'd region   (thread 3)
...

This is exactly what the NON_MAIN_ARENA flag (bit 2 of the size field, the A bit) is for. When free() sees a chunk with A = 1, it knows the chunk didn't come from the main arena and has to walk back to that chunk's own arena header to put the chunk in the right bin. glibc finds the arena header by masking the chunk's address down to a heap-aligned boundary and reading a small "heap_info" struct at the start of the mmap'd region.

Thread-local state

On top of arenas, each thread also carries two pieces of thread-local state:

  1. The tcache — its private fast-path cache (discussed above).
  2. An "arena hint" — the last arena this thread used, so it can reuse it without searching.

Put all of this together and the full hierarchy looks like this:

Thread 1:
    tcache (private)
        │
        ▼
    Main arena (shared, locked)
        │
        ├── fastbins[10]
        ├── bins[254]   (unsorted, small, large)
        ├── top chunk   →  sbrk() heap
        └── next →  Arena 2
                        │
                        ├── fastbins[10]
                        ├── bins[254]
                        ├── top chunk  →  mmap'd heap
                        └── next →  ...

Every malloc() drills down through this structure: tcache first, then fastbins, then small/large bins, then top chunk, then sbrk/mmap. Every free() does the reverse, depositing chunks into whichever layer accepts them.

That is the full picture of glibc's malloc. Chunks wrapped in headers, organized into bins, grouped under arenas, cached per-thread, all stitched together with pointer metadata stored inside the free memory itself. And that last detail — metadata living inside the very memory the program is free to overwrite — is the crack through which every heap exploit in the next article will squeeze.

Visualizing with GDB

Let's see real chunks in memory:

#include <stdlib.h>
#include <string.h>

int main() {
    char *ptr = malloc(15);

    strcpy(ptr, "hacksland.net");

    free(ptr);

    return 0;
}
$ gcc -m32 -g -o heap_demo heap_demo.c
$ gdb ./heap_demo
(gdb) disass main
Dump of assembler code for function main:
   0x0000119d <+0>: lea    ecx,[esp+0x4]
   0x000011a1 <+4>: and    esp,0xfffffff0
   0x000011a4 <+7>: push   DWORD PTR [ecx-0x4]
   0x000011a7 <+10>:    push   ebp
   0x000011a8 <+11>:    mov    ebp,esp
   0x000011aa <+13>:    push   ebx
   0x000011ab <+14>:    push   ecx
   0x000011ac <+15>:    sub    esp,0x10
   0x000011af <+18>:    call   0x10a0 <__x86.get_pc_thunk.bx>
   0x000011b4 <+23>:    add    ebx,0x2e20
   0x000011ba <+29>:    sub    esp,0xc
   0x000011bd <+32>:    push   0xf
   0x000011bf <+34>:    call   0x1050 <malloc@plt>
   0x000011c4 <+39>:    add    esp,0x10
   0x000011c7 <+42>:    mov    DWORD PTR [ebp-0xc],eax
   0x000011ca <+45>:    mov    eax,DWORD PTR [ebp-0xc]
   0x000011cd <+48>:    mov    DWORD PTR [eax],0x6b636168
   0x000011d3 <+54>:    mov    DWORD PTR [eax+0x4],0x6e616c73
   0x000011da <+61>:    mov    DWORD PTR [eax+0x8],0x656e2e64
   0x000011e1 <+68>:    mov    WORD PTR [eax+0xc],0x74
   0x000011e7 <+74>:    sub    esp,0xc
   0x000011ea <+77>:    push   DWORD PTR [ebp-0xc]
   0x000011ed <+80>:    call   0x1040 <free@plt>
   0x000011f2 <+85>:    add    esp,0x10
   0x000011f5 <+88>:    mov    eax,0x0
   0x000011fa <+93>:    lea    esp,[ebp-0x8]
   0x000011fd <+96>:    pop    ecx
   0x000011fe <+97>:    pop    ebx
   0x000011ff <+98>:    pop    ebp
   0x00001200 <+99>:    lea    esp,[ecx-0x4]
   0x00001203 <+102>:   ret
End of assembler dump.
(gdb) b *main+42
Breakpoint 1 at 0x11c7: file heap.c, line 5.
(gdb) b *main+80
Breakpoint 2 at 0x11ed: file heap.c, line 9.
(gdb) b *main+93
Breakpoint 3 at 0x11fa: file heap.c, line 12.
(gdb) i r eax
eax            0x5655a1a0          1448452512

So we know the pointer returned by the malloc() points to the memory address 0x5655a1a0. This should be in the heap area.

Lets examine the memory at this area.

(gdb) x/20wx $eax
0x5655a1a0: 0x00000000  0x00000000  0x00000000  0x00000000
0x5655a1b0: 0x00000000  0x00000000  0x00000000  0x00021e49
0x5655a1c0: 0x00000000  0x00000000  0x00000000  0x00000000
0x5655a1d0: 0x00000000  0x00000000  0x00000000  0x00000000
0x5655a1e0: 0x00000000  0x00000000  0x00000000  0x00000000

But wait, Does this pointer points to the actual starting of the chunk? No right, as we discussed above it points to the user data.

If we want to get the actual address of the chunck we need to get ptr - 8.

0x5655a1a0 - 8 = 0x5655a198

Lets use GDB again to see what is in the actual starting point.

(gdb) x/20wx $eax - 8
0x5655a198: 0x00000000  0x00000021  0x00000000  0x00000000
0x5655a1a8: 0x00000000  0x00000000  0x00000000  0x00000000
0x5655a1b8: 0x00000000  0x00021e49  0x00000000  0x00000000
0x5655a1c8: 0x00000000  0x00000000  0x00000000  0x00000000
0x5655a1d8: 0x00000000  0x00000000  0x00000000  0x00000000

so we can see these details,

prev. size = 0x00000000 size + flags = 0x00000011

Lets try to get the actual size, We know 0x00000011 holds both the size and flag related meta data.

0x00000021 = 33 13 = 32 + 1

(gdb) x/20wx $eax - 8
0x5655a198: 0x00000000  0x00000021  0x6b636168  0x6e616c73
0x5655a1a8: 0x656e2e64  0x00000074  0x00000000  0x00000000
0x5655a1b8: 0x00000000  0x00021e49  0x00000000  0x00000000
0x5655a1c8: 0x00000000  0x00000000  0x00000000  0x00000000
0x5655a1d8: 0x00000000  0x00000000  0x00000000  0x00000000
(gdb) x/20wx 0x5655a1a0 - 8
0x5655a198: 0x00000000  0x00000021  0x0005655a  0xc0df952c
0x5655a1a8: 0x656e2e64  0x00000074  0x00000000  0x00000000
0x5655a1b8: 0x00000000  0x00021e49  0x00000000  0x00000000
0x5655a1c8: 0x00000000  0x00000000  0x00000000  0x00000000
0x5655a1d8: 0x00000000  0x00000000  0x00000000  0x00000000

I want to show you another interesting thing. Lets consider the bellow example.

#include <stdlib.h>
#include <string.h>

int main() {
    char *ptr = malloc(10);

    strcpy(ptr, "hacksland");

    free(ptr);

    return 0;
}
(gdb) disass main
Dump of assembler code for function main:
   0x0000119d <+0>: lea    ecx,[esp+0x4]
   0x000011a1 <+4>: and    esp,0xfffffff0
   0x000011a4 <+7>: push   DWORD PTR [ecx-0x4]
   0x000011a7 <+10>:    push   ebp
   0x000011a8 <+11>:    mov    ebp,esp
   0x000011aa <+13>:    push   ebx
   0x000011ab <+14>:    push   ecx
   0x000011ac <+15>:    sub    esp,0x10
   0x000011af <+18>:    call   0x10a0 <__x86.get_pc_thunk.bx>
   0x000011b4 <+23>:    add    ebx,0x2e20
   0x000011ba <+29>:    sub    esp,0xc
   0x000011bd <+32>:    push   0xa
   0x000011bf <+34>:    call   0x1050 <malloc@plt>
   0x000011c4 <+39>:    add    esp,0x10
   0x000011c7 <+42>:    mov    DWORD PTR [ebp-0xc],eax
   0x000011ca <+45>:    mov    eax,DWORD PTR [ebp-0xc]
   0x000011cd <+48>:    mov    DWORD PTR [eax],0x6b636168
   0x000011d3 <+54>:    mov    DWORD PTR [eax+0x4],0x6e616c73
   0x000011da <+61>:    mov    WORD PTR [eax+0x8],0x64
   0x000011e0 <+67>:    sub    esp,0xc
   0x000011e3 <+70>:    push   DWORD PTR [ebp-0xc]
   0x000011e6 <+73>:    call   0x1040 <free@plt>
   0x000011eb <+78>:    add    esp,0x10
   0x000011ee <+81>:    mov    eax,0x0
   0x000011f3 <+86>:    lea    esp,[ebp-0x8]
   0x000011f6 <+89>:    pop    ecx
   0x000011f7 <+90>:    pop    ebx
   0x000011f8 <+91>:    pop    ebp
   0x000011f9 <+92>:    lea    esp,[ecx-0x4]
   0x000011fc <+95>:    ret
End of assembler dump.

So we know the pointer returned by the malloc() points to the memory address 0x5655a1a0. This should be in the heap area.

Lets examine the memory at this area.

(gdb) x/10wx $eax
0x5655a1a0: 0x00000000  0x00000000  0x00000000  0x00021e59
0x5655a1b0: 0x00000000  0x00000000  0x00000000  0x00000000
0x5655a1c0: 0x00000000  0x00000000
(gdb) x/10wx $eax - 8
0x5655a198: 0x00000000  0x00000011  0x00000000  0x00000000
0x5655a1a8: 0x00000000  0x00021e59  0x00000000  0x00000000
0x5655a1b8: 0x00000000  0x00000000

so we can see these details,

prev. size = 0x00000000 size + flags = 0x00000011

Lets try to get the actual size, We know 0x00000011 holds both the size and flag related meta data.

0x00000011 = 17 17 = 16 + 1

(gdb) x/10wx $eax - 8
0x5655a198: 0x00000000  0x00000011  0x6b636168  0x6e616c73
0x5655a1a8: 0x00000064  0x00021e59  0x00000000  0x00000000
0x5655a1b8: 0x00000000  0x00000000

Now we want to understand what actually happens when a chunk is getting free. I mentioned that the chunk's user data getting replaced by two pointers called forward pointer and backward pointer. Lets see this in action.

I wrote this simple C program to demonstrate this.

#include <stdlib.h>
#include <string.h>

#define M_MXFAST 1
extern int mallopt(int, int);

int main() {
    mallopt(M_MXFAST, 0);

    char *a = malloc(15);
    char *b = malloc(8);
    char *c = malloc(12);

    strcpy(a, "hacksland.net");    
    strcpy(b, "exploit");
    strcpy(c, "development");

    free(b);

    return 0;
}

Don't worry about the extra lines. These just disable fast bins. This is a feature introduced in modern glibc. But first of all we need to understand the fundamental. Since that I disabled the fast bins using above trick. You can understand all other code parts. The program allocates three chunks, then copies three strings.

Finally, it free one chunk.

Let's use GDB to examine this.

Again, We need to set an environment variable before debugging with GDB.

set GLIBC_TUNABLES glibc.malloc.tcache_count=0

This will effectively disable tcache which is another modern feature.

Here we can see the disassembly of the main function.

(gdb) disass main
Dump of assembler code for function main:
   0x000011ad <+0>: lea    ecx,[esp+0x4]
   0x000011b1 <+4>: and    esp,0xfffffff0
   0x000011b4 <+7>: push   DWORD PTR [ecx-0x4]
   0x000011b7 <+10>:    push   ebp
   0x000011b8 <+11>:    mov    ebp,esp
   0x000011ba <+13>:    push   ebx
   0x000011bb <+14>:    push   ecx
   0x000011bc <+15>:    sub    esp,0x10
   0x000011bf <+18>:    call   0x10b0 <__x86.get_pc_thunk.bx>
   0x000011c4 <+23>:    add    ebx,0x2e0c
   0x000011ca <+29>:    sub    esp,0x8
   0x000011cd <+32>:    push   0x0
   0x000011cf <+34>:    push   0x1
   0x000011d1 <+36>:    call   0x1060 <mallopt@plt>
   0x000011d6 <+41>:    add    esp,0x10
   0x000011d9 <+44>:    sub    esp,0xc
   0x000011dc <+47>:    push   0xf
   0x000011de <+49>:    call   0x1050 <malloc@plt>
   0x000011e3 <+54>:    add    esp,0x10
   0x000011e6 <+57>:    mov    DWORD PTR [ebp-0xc],eax
   0x000011e9 <+60>:    sub    esp,0xc
   0x000011ec <+63>:    push   0x8
   0x000011ee <+65>:    call   0x1050 <malloc@plt>
   0x000011f3 <+70>:    add    esp,0x10
   0x000011f6 <+73>:    mov    DWORD PTR [ebp-0x10],eax
   0x000011f9 <+76>:    sub    esp,0xc
   0x000011fc <+79>:    push   0xc
   0x000011fe <+81>:    call   0x1050 <malloc@plt>
   0x00001203 <+86>:    add    esp,0x10
   0x00001206 <+89>:    mov    DWORD PTR [ebp-0x14],eax
   0x00001209 <+92>:    mov    eax,DWORD PTR [ebp-0xc]
   0x0000120c <+95>:    mov    DWORD PTR [eax],0x6b636168
   0x00001212 <+101>:   mov    DWORD PTR [eax+0x4],0x6e616c73
   0x00001219 <+108>:   mov    DWORD PTR [eax+0x8],0x656e2e64
   0x00001220 <+115>:   mov    WORD PTR [eax+0xc],0x74
   0x00001226 <+121>:   mov    eax,DWORD PTR [ebp-0x10]
   0x00001229 <+124>:   mov    DWORD PTR [eax],0x6c707865
   0x0000122f <+130>:   mov    DWORD PTR [eax+0x4],0x74696f
   0x00001236 <+137>:   mov    eax,DWORD PTR [ebp-0x14]
   0x00001239 <+140>:   mov    DWORD PTR [eax],0x65766564
   0x0000123f <+146>:   mov    DWORD PTR [eax+0x4],0x6d706f6c
   0x00001246 <+153>:   mov    DWORD PTR [eax+0x8],0x746e65
   0x0000124d <+160>:   sub    esp,0xc
   0x00001250 <+163>:   push   DWORD PTR [ebp-0x10]
   0x00001253 <+166>:   call   0x1040 <free@plt>
   0x00001258 <+171>:   add    esp,0x10
   0x0000125b <+174>:   sub    esp,0xc
   0x0000125e <+177>:   push   DWORD PTR [ebp-0x14]
   0x00001261 <+180>:   call   0x1040 <free@plt>
   0x00001266 <+185>:   add    esp,0x10
   0x00001269 <+188>:   mov    eax,0x0
   0x0000126e <+193>:   lea    esp,[ebp-0x8]
   0x00001271 <+196>:   pop    ecx
   0x00001272 <+197>:   pop    ebx
   0x00001273 <+198>:   pop    ebp
   0x00001274 <+199>:   lea    esp,[ecx-0x4]
   0x00001277 <+202>:   ret
End of assembler dump.

It is much similar to above codes. Just allocating three chunks. Now I set some breakpoints and start examining.

(gdb) b *main+57
Breakpoint 1 at 0x11e6: file heap.c, line 9.
(gdb) b *main+73
Breakpoint 2 at 0x11f6: file heap.c, line 10.
(gdb) b *main+89
Breakpoint 3 at 0x1206: file heap.c, line 11.
(gdb) b *main+166
Breakpoint 4 at 0x1253: file heap.c, line 17.
(gdb) b *main+174
Breakpoint 5 at 0x125b: file heap.c, line 19.

So the first breakpoint hits right after malloc(15) returns. The eax register holds the returned pointer 0x5655a1a0. Let's examine 8 bytes before it to see the chunk header.

(gdb) x/20wx $eax - 8
0x5655a198: 0x00000000  0x00000021  0x00000000  0x00000000
0x5655a1a8: 0x00000000  0x00000000  0x00000000  0x00000000
0x5655a1b8: 0x00000000  0x00021e49  0x00000000  0x00000000
0x5655a1c8: 0x00000000  0x00000000  0x00000000  0x00000000
0x5655a1d8: 0x00000000  0x00000000  0x00000000  0x00000000

The first chunk has been carved from the top. The prev_size is 0x00000000 and the size field is 0x00000021. Remember, the lowest bit is the PREV_INUSE flag, so the actual chunk size is 0x20 (32 bytes). The remaining space at 0x5655a1b8 is the top chunk (also called the wilderness) — its size is 0x00021e49. After continuing to the second breakpoint, malloc(8) returns and a new chunk appears:

(gdb) x/20wx 0x5655a198
0x5655a198: 0x00000000  0x00000021  0x00000000  0x00000000
0x5655a1a8: 0x00000000  0x00000000  0x00000000  0x00000000
0x5655a1b8: 0x00000000  0x00000011  0x00000000  0x00000000
0x5655a1c8: 0x00000000  0x00021e39  0x00000000  0x00000000
0x5655a1d8: 0x00000000  0x00000000  0x00000000  0x00000000

The second chunk has a size of 0x11, meaning 0x10 (16 bytes) with the PREV_INUSE bit set. The top chunk shrank from 0x00021e49 to 0x00021e39 — it gave away 0x10 bytes and moved forward. After the third malloc(12), same pattern:

(gdb) x/20wx 0x5655a198
0x5655a198: 0x00000000  0x00000021  0x00000000  0x00000000
0x5655a1a8: 0x00000000  0x00000000  0x00000000  0x00000000
0x5655a1b8: 0x00000000  0x00000011  0x00000000  0x00000000
0x5655a1c8: 0x00000000  0x00000011  0x00000000  0x00000000
0x5655a1d8: 0x00000000  0x00021e29  0x00000000  0x00000000

Another 0x10-sized chunk. Interestingly, both malloc(8) and malloc(12) produce the same chunk size. This is because of alignment. glibc rounds up the requested size plus overhead to the nearest 16-byte boundary: align_up(8 + 4, 16) = 16 and align_up(12 + 4, 16) = 16. The top chunk shrank again to 0x00021e29. At this point we have three chunks lined up on the heap:

Chunk A: malloc(15)
0x5655a198:  [prev_size: 0x00] [size: 0x21]
0x5655a1a0:  [       24 bytes user data      ]

         Chunk B: malloc(8)
0x5655a1b8:  [prev_size: 0x00] [size: 0x11]
0x5655a1c0:  [        8 bytes user data      ]

         Chunk C: malloc(12)
0x5655a1c8:  [prev_size: 0x00] [size: 0x11]
0x5655a1d0:  [        8 bytes user data      ]

         Top Chunk
0x5655a1d8:  [prev_size: 0x00] [size: 0x21e29]

Notice that every chunk's prev_size is zero. This field only becomes meaningful when the previous chunk is free. Since all chunks are currently allocated, glibc doesn't bother writing to it. At breakpoint 4, the strcpy calls have filled in the user data. The compiler actually optimized these into direct MOV instructions. The heap now looks like this:

(gdb) x/20wx 0x5655a198
0x5655a198: 0x00000000  0x00000021  0x6b636168  0x6e616c73
0x5655a1a8: 0x656e2e64  0x00000074  0x00000000  0x00000000
0x5655a1b8: 0x00000000  0x00000011  0x6c707865  0x0074696f
0x5655a1c8: 0x00000000  0x00000011  0x65766564  0x6d706f6c
0x5655a1d8: 0x00746e65  0x00021e29  0x00000000  0x00000000

The strings are stored in little-endian byte order. For example, chunk A contains "hacksland.net" — 0x6b636168 is "hack" read backwards byte by byte, 0x6e616c73 is "slan", and so on. Chunk B holds "exploit" and chunk C holds "development". Now the interesting part.

We continue past free(b) and examine the heap again:

(gdb) x/20wx 0x5655a198
0x5655a198: 0x00000000  0x00000021  0x6b636168  0x6e616c73
0x5655a1a8: 0x656e2e64  0x00000074  0x00000000  0x00000000
0x5655a1b8: 0x00000000  0x00000011  0xf7faf798  0xf7faf798
0x5655a1c8: 0x00000010  0x00000010  0x65766564  0x6d706f6c
0x5655a1d8: 0x00746e65  0x00021e29  0x00000000  0x00000000

Three things changed. Let's look at each one.

  1. Chunk B's user data was replaced by fd and bk pointers
  • Before: 0x6c707865 0x0074696f ("exploit")
  • After: 0xf7faf798 0xf7faf798 (fd and bk)

The string "exploit" is gone. In its place are two identical pointers: 0xf7faf798. These are the forward pointer (fd) and backward pointer (bk). They both point to the same address — the unsorted bin head inside glibc's main_arena structure. The unsorted bin is a circular doubly-linked list. Since chunk B is the only free chunk in the bin, it points both forward and backward to the list head itself.

  1. Chunk C's prev_size is now populated
  • Before: 0x00000000
  • After: 0x00000010

When glibc freed chunk B, it wrote B's chunk size (0x10) into chunk C's prev_size field. This is how glibc prepares for backward coalescing. If chunk C is freed later, glibc reads this prev_size, subtracts it from C's address, and finds chunk B's header — then merges both chunks into one larger free chunk.

  1. Chunk C's PREV_INUSE bit was cleared
  • Before: 0x00000011 (0x10 | 0x1) → previous chunk is IN USE
  • After: 0x00000010 (0x10 | 0x0) → previous chunk is FREE

This is the critical flag. glibc cleared the lowest bit of chunk C's size field. This tells any future operation: "the chunk before me is free." Without this bit, free(c) would not even attempt to look at prev_size or coalesce backwards.

Now we continue past free(c):

(gdb) x/20wx 0x5655a198 0x5655a198: 0x00000000 0x00000021 0x6b636168 0x6e616c73 0x5655a1a8: 0x656e2e64 0x00000074 0x00000000 0x00000000 0x5655a1b8: 0x00000000 0x00020e49 0xf7faf798 0xf7faf798 0x5655a1c8: 0x00000010 0x00000010 0x65766564 0x6d706f6c 0x5655a1d8: 0x00746e65 0x00021e29 0x00000000 0x00000000

Look at what changed at 0x5655a1bc — the size field. It was 0x00000011 (chunk B's size). Now it's 0x00020e49. The top chunk has swallowed chunks B and C entirely and moved backward to 0x5655a1b8. Here's what glibc did step by step: Step 1: Check backward — is the previous chunk free? glibc looks at chunk C's PREV_INUSE bit. It's 0 — the previous chunk (B) is free. So glibc reads C's prev_size (0x10), subtracts it from C's address, and walks backward to find chunk B: B's header = 0x5655a1c8 - 0x10 = 0x5655a1b8 ✓ It then unlinks chunk B from the unsorted bin — removing it from the doubly-linked list using the fd and bk pointers. The two chunks merge: merged size = B's size + C's size = 0x10 + 0x10 = 0x20 Step 2: Check forward — is the next chunk the top chunk? The chunk after C starts at 0x5655a1c8 + 0x10 = 0x5655a1d8 — that's the top chunk. When a free chunk is adjacent to the wilderness, glibc doesn't bother putting it in a bin. It absorbs everything into the top chunk: new top starts at: 0x5655a1b8 new top size = 0x20 (merged B+C) + 0x21e29 (old top) = 0x21e49 But wait — the size shows 0x20e49, not 0x21e49. That's heap trimming in action. The combined top chunk exceeded glibc's M_TRIM_THRESHOLD (default 128KB). So glibc called sbrk with a negative value to return one page (4096 = 0x1000 bytes) back to the operating system: 0x21e49 - 0x1000 = 0x20e49 ✓ The kernel reclaimed a page of memory. This is glibc being a good citizen — if a program frees a large amount of memory, it gives some back rather than hoarding it forever.

ALSO READ
Remote Code Execution (RCE)
Jan 02 Application Security

Remote Code Execution (RCE) is the holy grail of application security vulnerabilities. It allows an attacker to execute arbitrary code on a remote server — and the consequences are as bad as it sounds. In this post, we'll go deep into RCE across multiple languages, including PHP, Java, Python, and Node.js.

Understanding the Heap Internals
Apr 12 Exploit development

So far in this series, we've exploited the **stack** buffer overflows, ROP chains, format strings. The stack is predictable: local variables go in, function returns pop them out, everything follows a...

Common Web Application Attacks
Feb 05 Application Security

Web applications are one of the most targeted surfaces by attackers. This is primarily because they are accessible over the internet, making them exposed and potentially vulnerable. Since these...

How I built a web based CPU Simulator
May 07 Pet Projects

As someone passionate about computer engineering, reverse engineering, and system internals, I've always been fascinated by what happens "under the hood" of a computer. This curiosity led me to...

Identity and Access Management (IAM)
May 11 Identity & Access Management

Who are you — and what are you allowed to do? That's the fundamental question every secure system must answer. And it's exactly what Identity and Access Management (IAM) is built to solve.

Access Control Models
Apr 08 Identity & Access Management

Access control is one of the most fundamental concepts in security. Every time you set file permissions, assign user roles, or restrict access to a resource, you're implementing some form of access control. But not all access control is created equal...