Understanding the Heap Internals
So far in this series, we've exploited the stack buffer overflows, ROP chains, format strings. The stack is predictable: local variables go in, function returns pop them out, everything follows a strict LIFO order.
The heap is a completely different beast. It's dynamic memory malloc() and free(). Programs request memory of arbitrary sizes at arbitrary times and free it in any order. The heap allocator has to manage this chaos efficiently, and the bookkeeping metadata it uses to track allocations is exactly what attackers target.
Heap exploitation is how most modern exploits work. Browser exploits, kernel exploits, server exploits the majority involve heap corruption. The stack is too well-protected (canaries, ASLR, DEP). The heap is where the action is.
But before we can exploit the heap, we need to understand how it works. This article covers the internals of glibc's malloc (ptmalloc2) — the default allocator on Linux. The next article covers the actual exploitation techniques.
I know. This going to be a very lengthy article, BTW I did not want to split this into multiple posts. Here you can find the whole complete idea about the heap you need for the Heap based buffer overflows.
Stack vs Heap — The Fundamental Difference
Stack:.
- Fixed size (usually 8MB per thread)
- Grows/shrinks automatically with function calls
- LIFO order — last allocated, first freed
- Managed by the compiler (push/pop)
- Local variables, function arguments, return addresses
Heap:
- Dynamic size (grows as needed)
- Allocated/freed explicitly by the programmer
- No ordering — free in any order
- Managed by the allocator (malloc/free)
- Dynamically sized data, objects, buffers
To understand the differences lets consider a simple C function.
void memoryAllocation() {
int x = 42; // Stack — automatic, freed when function returns
char *buf = malloc(256); // Heap — manual, stays until free() is called
free(buf); // Programmer must free explicitly
}
The stack is simple. The heap is complex. And complexity breeds vulnerabilities.
malloc() and free() — The Interface
The programmer sees a simple API:
void *malloc(size_t size); // Allocate 'size' bytes, return pointer
void free(void *ptr); // Free previously allocated memory
void *realloc(void *ptr, size_t size); // Resize allocation
void *calloc(size_t nmemb, size_t size); // Allocate and zero-initialize
Behind this simple interface, glibc's malloc maintains a sophisticated system of chunks, bins, and arenas to manage memory efficiently.
Chunks — The Building Block
The first and most important concept in this tutorial is the chunk. As the first step we are going to see what is a chunk and how it looks like.
Every malloc() allocation is wrapped in a chunk.

Each chunk contains metadata (header) and the user data.
Allocated Chunk Layout
The actual layout of a chunk is something like below.

We know that whenever we use malloc() function we recieve a pointer. (If memory allocation was success). This pointer is always pointed to the user data. Not to the starting of the chunk. In the 32 bit architecture the meta data is 8 bytes in size. So, if we want to get the address of the starting point of the chunk we have do substract 8 from the pointer which is returned by malloc(). We need to keep this in mind.
Now lets dive deep in to the chunk and see how it looks like.

The meta data section has two parts,
Prev size
This indicates the size of the previous chunk. Actually this does not represent the size always. I'll explain it later. For now keep in mind this represents the size of the previous chunk.
Size + Flags
As the name suggests, this indicates the size field stores the chunk's total size, but the bottom 3 bits are repurposed as flags.
Wait. How that is possible? Let me explain.
The chunk size is always a multiplication of 8. Such as 8 bytes, 16 bytes, 24 bytes etc.
$$ 8_{10} = 00001000_2 $$
$$ 16_{10} = 00010000_2 $$
$$ 24_{10} = 00011000_2 $$
$$ 32_{10} = 00100000_2 $$
So the last 3 bits of the size would always be zero. We can use these three bits as the space for the flags.
Actual Size in bytes + | A | M | P |
| Bit | Name | Meaning |
|---|---|---|
| Bit 0 | P (PREV_INUSE) | Previous chunk is in use (not free) |
| Bit 1 | M (IS_MMAPPED) | Chunk was allocated via mmap() |
| Bit 2 | A (NON_MAIN_ARENA) | Chunk belongs to a non-main arena (threading) |
The PREV_INUSE bit is critical for exploitation. It tells the allocator whether the chunk before this one is free.
Lets consider an example, We see following as the metadata for the current chunk.
size + flags = 0x00000021
Now,
$$ 00000021_{16} = 33_{10} $$
So this does not mean the actual size is 33 bytes.
To get the actual size, we need to convert it into binary and disregard the last three bits,
$$ 33_{10} = 0100001_2 $$
After removing the last three digits it is 0100000.
$$ 0100000_2 = 32_{10} $$
In a progamaticle way we can say this in below way,
We can use the bitwise AND operator with 11111000 to get rid of the last three bits.
$$ 0100001_{2} \land 11111000_{2} $$
- tcache (modern glibc, default)
- fastbins
- unsorted / small / large bins
Bins — Where Free Chunks Live
There are four types of bins.
Fast Bins
A fast bin is made of a linked list. More specifically a singly linked list. Oh, Finally an actual usage of Linked list :).
Fastbins handle chunks from 0x10 (16 bytes) to to 0x50 (80 bytes) TOTAL chunk size.
So valid fastbin chunk sizes:
0x10, 0x18, 0x20, 0x28, 0x30, 0x38, 0x40, 0x48, 0x50
And one another important thing is they're LIFO. That means last freed, first allocated. Don't worry I'll explain these clearly.
The layout of a chunk after it is moved into a fast bin is below.
As I mentioned there is a one pointer called fd. And it is placed at the user data section. Because after using free() we are not going to use those user data. Since that we can use that space for the pointer.
So, what does this fd pointer do? So ultimately the fastbin section is a collection of linked lists (Fast bins).
Fastbin[0]: chunk_c → chunk_b → chunk_a → NULL
(size 16)
Fastbin[1]: chunk_f → chunk_e → NULL
(size 24)
...up to 10 fastbins
Key properties: - No coalescing — adjacent free chunks in fast bins are NOT merged together - LIFO order — free(a); free(b); malloc(same_size) returns b (last freed, first out) - Singly linked — only fd, no bk - PREV_INUSE bit is not cleared — the next chunk doesn't know this chunk is free Fast bins exist for performance. Small, frequent allocations (the most common case) are served extremely quickly.
Unsorted Bin
When a non-fastbin chunk is freed, it first goes into the unsorted bin — a single doubly-linked list that acts as a staging area.
When a chunk is freed, the user data area is repurposed to store free list pointers:

The fd and bk pointers form a doubly-linked list of free chunks. This is where the allocator finds memory to satisfy future malloc() calls.
These pointers occupy the same memory that was previously user data. The allocator knows it's safe to write here because the chunk is free — nobody should be reading this memory anymore. (Spoiler: use-after-free exploits violate this assumption.)
Unsorted bin: chunk_x ⟷ chunk_y ⟷ chunk_z
When malloc() can't find a fast bin match, it searches the unsorted bin. Each chunk in the unsorted bin is either:
- Returned directly if it's the right size
- Sorted into the appropriate small bin or large bin
Small Bins
Small bins hold chunks of a specific size. There are 62 small bins, each for a different chunk size (16, 24, 32, ..., 504 bytes on 32-bit).
Smallbin[0]: chunks of size 16
Smallbin[1]: chunks of size 24
...
Smallbin[61]: chunks of size 504
Small bins are doubly-linked (FIFO). They're exact-fit — malloc(24) only checks the smallbin for size 24.
Large Bins
Large bins hold chunks of a range of sizes. There are 63 large bins, and each covers an increasingly wide range:
Largebin[0]: 512-568 bytes
Largebin[1]: 576-632 bytes
...
Largebin[62]: very large chunks
Large bins are sorted within each bin (largest first), so the allocator can find the best fit.
Tcache — The Modern Fast Path
Starting with glibc 2.26 (released in 2017), a new layer was added on top of everything we've discussed so far the thread-local cache, or tcache. If you're on a modern Linux distribution, tcache is the very first thing malloc() checks and the very first place free() puts a chunk.
The motivation is performance. Fast bins were fast, but they were still part of the main arena — accessing them required taking an arena lock in multithreaded programs. Tcache sidesteps this entirely: each thread gets its own private cache, no locks needed.
Thread 1 tcache: Thread 2 tcache:
[0x20] → A → B → NULL [0x20] → X → NULL
[0x30] → C → NULL [0x40] → Y → Z → NULL
[0x40] → NULL ...
...
Key properties of tcache:
- Per-thread — no locking, extremely fast
- Singly-linked LIFO — just like fast bins, only
fdmatters - 64 bins by default, covering chunk sizes from 24 up to 1032 bytes (on 64-bit)
- 7 chunks per bin by default (
tcache_count = 7) - No coalescing, no sanity checks (historically) — which is exactly why it became a favorite exploitation target
When you call free(ptr), glibc's logic roughly looks like this:
1. Is the chunk size within tcache range AND is the matching tcache bin
not full (< 7 chunks)? → Put it in tcache. Done.
2. Is the chunk size within fast bin range? → Put it in fast bin. Done.
3. Is the chunk mmap'd? → munmap it.
4. Otherwise → try to coalesce with neighbors, then put it in the unsorted bin.
And on malloc():
1. Is there a matching chunk in tcache? → Return it immediately.
2. Check fast bins.
3. Check the small bin for exact size.
4. Walk the unsorted bin, sorting chunks as we go, returning an exact match.
5. Check large bins.
6. Carve a new chunk from the top chunk.
This is why, in our GDB demonstrations later in this article, we explicitly disable tcache with GLIBC_TUNABLES=glibc.malloc.tcache_count=0. Without that, our freed chunks would silently land in tcache and we would never see the unsorted-bin fd/bk pointers that make the internals understandable. Tcache hides all the interesting metadata behind a trivial linked list.
From an exploitation standpoint, tcache is a gift. Early versions had no double-free detection, no size checks, and no pointer mangling — which gave rise to primitives like tcache poisoning, where overwriting a freed chunk's fd pointer causes the next malloc() of that size to return an attacker-chosen address. Later glibc versions added a key field and pointer mangling (PROTECT_PTR) to harden this, but we'll get into those details in the next article.
Coalescing — Merging Free Chunks
If the allocator simply kept every freed chunk as-is, the heap would quickly turn into a graveyard of tiny, unusable fragments. You'd free() a hundred 16-byte chunks next to each other and still not be able to satisfy a single malloc(1024), even though the memory is physically contiguous. This problem is called external fragmentation, and the allocator fights it with coalescing — merging adjacent free chunks into one larger free chunk.
Coalescing happens inside free(), and only for chunks that go through the "slow path" (not fast bins and not tcache — both of those deliberately skip coalescing for speed). There are two directions:
Backward coalescing — merging with the previous chunk:
1. Read the current chunk's size field.
2. Check the PREV_INUSE bit (bit 0).
3. If it's 0, the previous chunk is free.
4. Read `prev_size` from the current chunk's header.
5. Walk backward: prev_chunk = current - prev_size.
6. Unlink the previous chunk from its bin.
7. New chunk = prev_chunk, new size = prev_size + current_size.
Forward coalescing — merging with the next chunk:
1. Compute next_chunk = current + current_size.
2. Look at the chunk AFTER next_chunk (by walking current + current_size + next_size).
3. Check that chunk's PREV_INUSE bit.
4. If it's 0, then next_chunk is free.
5. Unlink next_chunk from its bin.
6. New size = current_size + next_chunk_size.
Notice something important the allocator never stores a "this chunk is free" bit on the chunk itself. Instead, a chunk's freeness is recorded in the next chunk's PREV_INUSE bit. This is a classic space-saving trick there's no room in the header of a free chunk for more metadata (the fd/bk pointers need that space), so the information is pushed forward to the neighbor.
This tiny detail is the root of an entire class of exploits. If an attacker can forge a fake chunk header with PREV_INUSE = 0 and a controlled prev_size, free() will happily "walk backward" to an attacker-chosen address and treat whatever it finds there as a free chunk. The unlink that follows becomes a write-what-where primitive. We will see this in the next post on heap exploitation — for now, just remember: coalescing trusts metadata completely.
After coalescing, the resulting merged chunk is placed into the unsorted bin (not back into its original bin), where a future malloc() call will either reuse it directly or sort it into a small/large bin.
The Top Chunk (Wilderness)
At the very end of the heap sits a special chunk called the top chunk, also known affectionately as the wilderness. It represents all the unallocated memory at the top of the heap region — everything the program has requested from the OS but not yet carved up into smaller chunks.
You've already seen the top chunk in action in the GDB examples earlier. Remember this line?
0x5655a1b8: 0x00000000 0x00021e49 0x00000000 0x00000000
That 0x00021e49 is the top chunk's size field. It's huge (about 138 KB) because it's essentially "the rest of the heap." Each time a malloc() call can't find a suitable chunk in any bin, the allocator falls back to the top chunk it carves a new chunk off the front of the wilderness and shrinks the top chunk by that amount. That's why, between each malloc() in our earlier example, we saw the top chunk's size shrink from 0x21e49 → 0x21e39 → 0x21e29.
The top chunk has a few special properties:
- It's never in any bin. The allocator keeps a direct pointer to it in the arena structure (
av->top). - It has no neighbor to its right. Its PREV_INUSE bit is conceptually always set, and its "next chunk" doesn't exist — attempting to read it would walk off the end of the heap.
- It absorbs adjacent frees. When you free a chunk that sits immediately before the top chunk, it doesn't go into a bin — it's merged directly into the wilderness. We saw this at the end of the earlier example, when freeing chunk C caused the top chunk to jump backward and swallow both B and C.
- It grows the heap when exhausted. If the top chunk is too small to satisfy a request, the allocator calls
sbrk()(for the main heap) ormmap()(for very large requests or secondary arenas) to extend it. - It can shrink. If the top chunk grows beyond
M_TRIM_THRESHOLD(default 128 KB), glibc trims the excess back to the OS with a negativesbrk(). This is exactly the "heap trimming" we observed at the very end of the GDB walkthrough, where0x21e49became0x20e49after glibc returned a 4 KB page to the kernel.
From an attacker's perspective, the top chunk is interesting because its size field is just another 8-byte value sitting in memory. If a heap buffer overflow lets you overwrite it, you can forge a gigantic size, and then request a huge malloc() that makes the allocator's internal pointer "wrap around" to an arbitrary address. This is the classic House of Force technique. Again — a story for the next article.
The Arena
Finally, the wrapper that ties all of this together: the arena. An arena is glibc's internal bookkeeping structure for a heap. It holds every piece of state we've discussed so far — all the bins, the top chunk pointer, a lock, and references to the underlying memory regions.
Here's a simplified view of struct malloc_state (the arena):
struct malloc_state {
mutex_t mutex; // Lock for thread safety
int flags;
mfastbinptr fastbinsY[10]; // Fast bins
mchunkptr top; // The top chunk
mchunkptr last_remainder; // Last split remainder
mchunkptr bins[254]; // Unsorted + small + large bins
unsigned int binmap[4]; // Bitmap of non-empty bins
struct malloc_state *next; // Linked list of arenas
...
};
Every Linux process has at least one arena: the main arena, which lives as a static global inside libc itself. That's why, in our earlier GDB session, chunk B's fd and bk pointers after free(b) both pointed to 0xf7faf798 — a fixed address inside libc.so. That's the main arena's unsorted bin head.
Why multiple arenas?
In a single-threaded program, the main arena is all you ever need. But in a multithreaded program, every malloc() and free() would need to take the arena lock, serializing every thread on a single mutex. That would kill performance.
So glibc creates additional arenas on demand. When a thread calls malloc() and finds the main arena locked, glibc may spin up a new arena (up to 8 * number_of_cores by default) and assign that thread to it. Each extra arena has its own heap region, allocated via mmap() rather than sbrk(), and its own independent set of bins.
Main arena → sbrk() heap (thread 1)
Arena 2 → mmap'd region (thread 2)
Arena 3 → mmap'd region (thread 3)
...
This is exactly what the NON_MAIN_ARENA flag (bit 2 of the size field, the A bit) is for. When free() sees a chunk with A = 1, it knows the chunk didn't come from the main arena and has to walk back to that chunk's own arena header to put the chunk in the right bin. glibc finds the arena header by masking the chunk's address down to a heap-aligned boundary and reading a small "heap_info" struct at the start of the mmap'd region.
Thread-local state
On top of arenas, each thread also carries two pieces of thread-local state:
- The tcache — its private fast-path cache (discussed above).
- An "arena hint" — the last arena this thread used, so it can reuse it without searching.
Put all of this together and the full hierarchy looks like this:
Thread 1:
tcache (private)
│
▼
Main arena (shared, locked)
│
├── fastbins[10]
├── bins[254] (unsorted, small, large)
├── top chunk → sbrk() heap
└── next → Arena 2
│
├── fastbins[10]
├── bins[254]
├── top chunk → mmap'd heap
└── next → ...
Every malloc() drills down through this structure: tcache first, then fastbins, then small/large bins, then top chunk, then sbrk/mmap. Every free() does the reverse, depositing chunks into whichever layer accepts them.
That is the full picture of glibc's malloc. Chunks wrapped in headers, organized into bins, grouped under arenas, cached per-thread, all stitched together with pointer metadata stored inside the free memory itself. And that last detail — metadata living inside the very memory the program is free to overwrite — is the crack through which every heap exploit in the next article will squeeze.
Visualizing with GDB
Let's see real chunks in memory:
#include <stdlib.h>
#include <string.h>
int main() {
char *ptr = malloc(15);
strcpy(ptr, "hacksland.net");
free(ptr);
return 0;
}
$ gcc -m32 -g -o heap_demo heap_demo.c
$ gdb ./heap_demo
(gdb) disass main
Dump of assembler code for function main:
0x0000119d <+0>: lea ecx,[esp+0x4]
0x000011a1 <+4>: and esp,0xfffffff0
0x000011a4 <+7>: push DWORD PTR [ecx-0x4]
0x000011a7 <+10>: push ebp
0x000011a8 <+11>: mov ebp,esp
0x000011aa <+13>: push ebx
0x000011ab <+14>: push ecx
0x000011ac <+15>: sub esp,0x10
0x000011af <+18>: call 0x10a0 <__x86.get_pc_thunk.bx>
0x000011b4 <+23>: add ebx,0x2e20
0x000011ba <+29>: sub esp,0xc
0x000011bd <+32>: push 0xf
0x000011bf <+34>: call 0x1050 <malloc@plt>
0x000011c4 <+39>: add esp,0x10
0x000011c7 <+42>: mov DWORD PTR [ebp-0xc],eax
0x000011ca <+45>: mov eax,DWORD PTR [ebp-0xc]
0x000011cd <+48>: mov DWORD PTR [eax],0x6b636168
0x000011d3 <+54>: mov DWORD PTR [eax+0x4],0x6e616c73
0x000011da <+61>: mov DWORD PTR [eax+0x8],0x656e2e64
0x000011e1 <+68>: mov WORD PTR [eax+0xc],0x74
0x000011e7 <+74>: sub esp,0xc
0x000011ea <+77>: push DWORD PTR [ebp-0xc]
0x000011ed <+80>: call 0x1040 <free@plt>
0x000011f2 <+85>: add esp,0x10
0x000011f5 <+88>: mov eax,0x0
0x000011fa <+93>: lea esp,[ebp-0x8]
0x000011fd <+96>: pop ecx
0x000011fe <+97>: pop ebx
0x000011ff <+98>: pop ebp
0x00001200 <+99>: lea esp,[ecx-0x4]
0x00001203 <+102>: ret
End of assembler dump.
(gdb) b *main+42
Breakpoint 1 at 0x11c7: file heap.c, line 5.
(gdb) b *main+80
Breakpoint 2 at 0x11ed: file heap.c, line 9.
(gdb) b *main+93
Breakpoint 3 at 0x11fa: file heap.c, line 12.
(gdb) i r eax
eax 0x5655a1a0 1448452512
So we know the pointer returned by the malloc() points to the memory address 0x5655a1a0. This should be in the heap area.
Lets examine the memory at this area.
(gdb) x/20wx $eax
0x5655a1a0: 0x00000000 0x00000000 0x00000000 0x00000000
0x5655a1b0: 0x00000000 0x00000000 0x00000000 0x00021e49
0x5655a1c0: 0x00000000 0x00000000 0x00000000 0x00000000
0x5655a1d0: 0x00000000 0x00000000 0x00000000 0x00000000
0x5655a1e0: 0x00000000 0x00000000 0x00000000 0x00000000
But wait, Does this pointer points to the actual starting of the chunk? No right, as we discussed above it points to the user data.
If we want to get the actual address of the chunck we need to get ptr - 8.
0x5655a1a0 - 8 = 0x5655a198
Lets use GDB again to see what is in the actual starting point.
(gdb) x/20wx $eax - 8
0x5655a198: 0x00000000 0x00000021 0x00000000 0x00000000
0x5655a1a8: 0x00000000 0x00000000 0x00000000 0x00000000
0x5655a1b8: 0x00000000 0x00021e49 0x00000000 0x00000000
0x5655a1c8: 0x00000000 0x00000000 0x00000000 0x00000000
0x5655a1d8: 0x00000000 0x00000000 0x00000000 0x00000000
so we can see these details,
prev. size = 0x00000000 size + flags = 0x00000011
Lets try to get the actual size, We know 0x00000011 holds both the size and flag related meta data.
0x00000021 = 33 13 = 32 + 1
(gdb) x/20wx $eax - 8
0x5655a198: 0x00000000 0x00000021 0x6b636168 0x6e616c73
0x5655a1a8: 0x656e2e64 0x00000074 0x00000000 0x00000000
0x5655a1b8: 0x00000000 0x00021e49 0x00000000 0x00000000
0x5655a1c8: 0x00000000 0x00000000 0x00000000 0x00000000
0x5655a1d8: 0x00000000 0x00000000 0x00000000 0x00000000
(gdb) x/20wx 0x5655a1a0 - 8
0x5655a198: 0x00000000 0x00000021 0x0005655a 0xc0df952c
0x5655a1a8: 0x656e2e64 0x00000074 0x00000000 0x00000000
0x5655a1b8: 0x00000000 0x00021e49 0x00000000 0x00000000
0x5655a1c8: 0x00000000 0x00000000 0x00000000 0x00000000
0x5655a1d8: 0x00000000 0x00000000 0x00000000 0x00000000
I want to show you another interesting thing. Lets consider the bellow example.
#include <stdlib.h>
#include <string.h>
int main() {
char *ptr = malloc(10);
strcpy(ptr, "hacksland");
free(ptr);
return 0;
}
(gdb) disass main
Dump of assembler code for function main:
0x0000119d <+0>: lea ecx,[esp+0x4]
0x000011a1 <+4>: and esp,0xfffffff0
0x000011a4 <+7>: push DWORD PTR [ecx-0x4]
0x000011a7 <+10>: push ebp
0x000011a8 <+11>: mov ebp,esp
0x000011aa <+13>: push ebx
0x000011ab <+14>: push ecx
0x000011ac <+15>: sub esp,0x10
0x000011af <+18>: call 0x10a0 <__x86.get_pc_thunk.bx>
0x000011b4 <+23>: add ebx,0x2e20
0x000011ba <+29>: sub esp,0xc
0x000011bd <+32>: push 0xa
0x000011bf <+34>: call 0x1050 <malloc@plt>
0x000011c4 <+39>: add esp,0x10
0x000011c7 <+42>: mov DWORD PTR [ebp-0xc],eax
0x000011ca <+45>: mov eax,DWORD PTR [ebp-0xc]
0x000011cd <+48>: mov DWORD PTR [eax],0x6b636168
0x000011d3 <+54>: mov DWORD PTR [eax+0x4],0x6e616c73
0x000011da <+61>: mov WORD PTR [eax+0x8],0x64
0x000011e0 <+67>: sub esp,0xc
0x000011e3 <+70>: push DWORD PTR [ebp-0xc]
0x000011e6 <+73>: call 0x1040 <free@plt>
0x000011eb <+78>: add esp,0x10
0x000011ee <+81>: mov eax,0x0
0x000011f3 <+86>: lea esp,[ebp-0x8]
0x000011f6 <+89>: pop ecx
0x000011f7 <+90>: pop ebx
0x000011f8 <+91>: pop ebp
0x000011f9 <+92>: lea esp,[ecx-0x4]
0x000011fc <+95>: ret
End of assembler dump.
So we know the pointer returned by the malloc() points to the memory address 0x5655a1a0. This should be in the heap area.
Lets examine the memory at this area.
(gdb) x/10wx $eax
0x5655a1a0: 0x00000000 0x00000000 0x00000000 0x00021e59
0x5655a1b0: 0x00000000 0x00000000 0x00000000 0x00000000
0x5655a1c0: 0x00000000 0x00000000
(gdb) x/10wx $eax - 8
0x5655a198: 0x00000000 0x00000011 0x00000000 0x00000000
0x5655a1a8: 0x00000000 0x00021e59 0x00000000 0x00000000
0x5655a1b8: 0x00000000 0x00000000
so we can see these details,
prev. size = 0x00000000 size + flags = 0x00000011
Lets try to get the actual size, We know 0x00000011 holds both the size and flag related meta data.
0x00000011 = 17 17 = 16 + 1
(gdb) x/10wx $eax - 8
0x5655a198: 0x00000000 0x00000011 0x6b636168 0x6e616c73
0x5655a1a8: 0x00000064 0x00021e59 0x00000000 0x00000000
0x5655a1b8: 0x00000000 0x00000000
Now we want to understand what actually happens when a chunk is getting free. I mentioned that the chunk's user data getting replaced by two pointers called forward pointer and backward pointer. Lets see this in action.
I wrote this simple C program to demonstrate this.
#include <stdlib.h>
#include <string.h>
#define M_MXFAST 1
extern int mallopt(int, int);
int main() {
mallopt(M_MXFAST, 0);
char *a = malloc(15);
char *b = malloc(8);
char *c = malloc(12);
strcpy(a, "hacksland.net");
strcpy(b, "exploit");
strcpy(c, "development");
free(b);
return 0;
}
Don't worry about the extra lines. These just disable fast bins. This is a feature introduced in modern glibc. But first of all we need to understand the fundamental. Since that I disabled the fast bins using above trick. You can understand all other code parts. The program allocates three chunks, then copies three strings.
Finally, it free one chunk.
Let's use GDB to examine this.
Again, We need to set an environment variable before debugging with GDB.
set GLIBC_TUNABLES glibc.malloc.tcache_count=0
This will effectively disable tcache which is another modern feature.
Here we can see the disassembly of the main function.
(gdb) disass main
Dump of assembler code for function main:
0x000011ad <+0>: lea ecx,[esp+0x4]
0x000011b1 <+4>: and esp,0xfffffff0
0x000011b4 <+7>: push DWORD PTR [ecx-0x4]
0x000011b7 <+10>: push ebp
0x000011b8 <+11>: mov ebp,esp
0x000011ba <+13>: push ebx
0x000011bb <+14>: push ecx
0x000011bc <+15>: sub esp,0x10
0x000011bf <+18>: call 0x10b0 <__x86.get_pc_thunk.bx>
0x000011c4 <+23>: add ebx,0x2e0c
0x000011ca <+29>: sub esp,0x8
0x000011cd <+32>: push 0x0
0x000011cf <+34>: push 0x1
0x000011d1 <+36>: call 0x1060 <mallopt@plt>
0x000011d6 <+41>: add esp,0x10
0x000011d9 <+44>: sub esp,0xc
0x000011dc <+47>: push 0xf
0x000011de <+49>: call 0x1050 <malloc@plt>
0x000011e3 <+54>: add esp,0x10
0x000011e6 <+57>: mov DWORD PTR [ebp-0xc],eax
0x000011e9 <+60>: sub esp,0xc
0x000011ec <+63>: push 0x8
0x000011ee <+65>: call 0x1050 <malloc@plt>
0x000011f3 <+70>: add esp,0x10
0x000011f6 <+73>: mov DWORD PTR [ebp-0x10],eax
0x000011f9 <+76>: sub esp,0xc
0x000011fc <+79>: push 0xc
0x000011fe <+81>: call 0x1050 <malloc@plt>
0x00001203 <+86>: add esp,0x10
0x00001206 <+89>: mov DWORD PTR [ebp-0x14],eax
0x00001209 <+92>: mov eax,DWORD PTR [ebp-0xc]
0x0000120c <+95>: mov DWORD PTR [eax],0x6b636168
0x00001212 <+101>: mov DWORD PTR [eax+0x4],0x6e616c73
0x00001219 <+108>: mov DWORD PTR [eax+0x8],0x656e2e64
0x00001220 <+115>: mov WORD PTR [eax+0xc],0x74
0x00001226 <+121>: mov eax,DWORD PTR [ebp-0x10]
0x00001229 <+124>: mov DWORD PTR [eax],0x6c707865
0x0000122f <+130>: mov DWORD PTR [eax+0x4],0x74696f
0x00001236 <+137>: mov eax,DWORD PTR [ebp-0x14]
0x00001239 <+140>: mov DWORD PTR [eax],0x65766564
0x0000123f <+146>: mov DWORD PTR [eax+0x4],0x6d706f6c
0x00001246 <+153>: mov DWORD PTR [eax+0x8],0x746e65
0x0000124d <+160>: sub esp,0xc
0x00001250 <+163>: push DWORD PTR [ebp-0x10]
0x00001253 <+166>: call 0x1040 <free@plt>
0x00001258 <+171>: add esp,0x10
0x0000125b <+174>: sub esp,0xc
0x0000125e <+177>: push DWORD PTR [ebp-0x14]
0x00001261 <+180>: call 0x1040 <free@plt>
0x00001266 <+185>: add esp,0x10
0x00001269 <+188>: mov eax,0x0
0x0000126e <+193>: lea esp,[ebp-0x8]
0x00001271 <+196>: pop ecx
0x00001272 <+197>: pop ebx
0x00001273 <+198>: pop ebp
0x00001274 <+199>: lea esp,[ecx-0x4]
0x00001277 <+202>: ret
End of assembler dump.
It is much similar to above codes. Just allocating three chunks. Now I set some breakpoints and start examining.
(gdb) b *main+57
Breakpoint 1 at 0x11e6: file heap.c, line 9.
(gdb) b *main+73
Breakpoint 2 at 0x11f6: file heap.c, line 10.
(gdb) b *main+89
Breakpoint 3 at 0x1206: file heap.c, line 11.
(gdb) b *main+166
Breakpoint 4 at 0x1253: file heap.c, line 17.
(gdb) b *main+174
Breakpoint 5 at 0x125b: file heap.c, line 19.
So the first breakpoint hits right after malloc(15) returns. The eax register holds the returned pointer 0x5655a1a0. Let's examine 8 bytes before it to see the chunk header.
(gdb) x/20wx $eax - 8
0x5655a198: 0x00000000 0x00000021 0x00000000 0x00000000
0x5655a1a8: 0x00000000 0x00000000 0x00000000 0x00000000
0x5655a1b8: 0x00000000 0x00021e49 0x00000000 0x00000000
0x5655a1c8: 0x00000000 0x00000000 0x00000000 0x00000000
0x5655a1d8: 0x00000000 0x00000000 0x00000000 0x00000000
The first chunk has been carved from the top. The prev_size is 0x00000000 and the size field is 0x00000021. Remember, the lowest bit is the PREV_INUSE flag, so the actual chunk size is 0x20 (32 bytes). The remaining space at 0x5655a1b8 is the top chunk (also called the wilderness) — its size is 0x00021e49. After continuing to the second breakpoint, malloc(8) returns and a new chunk appears:
(gdb) x/20wx 0x5655a198
0x5655a198: 0x00000000 0x00000021 0x00000000 0x00000000
0x5655a1a8: 0x00000000 0x00000000 0x00000000 0x00000000
0x5655a1b8: 0x00000000 0x00000011 0x00000000 0x00000000
0x5655a1c8: 0x00000000 0x00021e39 0x00000000 0x00000000
0x5655a1d8: 0x00000000 0x00000000 0x00000000 0x00000000
The second chunk has a size of 0x11, meaning 0x10 (16 bytes) with the PREV_INUSE bit set. The top chunk shrank from 0x00021e49 to 0x00021e39 — it gave away 0x10 bytes and moved forward. After the third malloc(12), same pattern:
(gdb) x/20wx 0x5655a198
0x5655a198: 0x00000000 0x00000021 0x00000000 0x00000000
0x5655a1a8: 0x00000000 0x00000000 0x00000000 0x00000000
0x5655a1b8: 0x00000000 0x00000011 0x00000000 0x00000000
0x5655a1c8: 0x00000000 0x00000011 0x00000000 0x00000000
0x5655a1d8: 0x00000000 0x00021e29 0x00000000 0x00000000
Another 0x10-sized chunk. Interestingly, both malloc(8) and malloc(12) produce the same chunk size. This is because of alignment. glibc rounds up the requested size plus overhead to the nearest 16-byte boundary: align_up(8 + 4, 16) = 16 and align_up(12 + 4, 16) = 16. The top chunk shrank again to 0x00021e29. At this point we have three chunks lined up on the heap:
Chunk A: malloc(15)
0x5655a198: [prev_size: 0x00] [size: 0x21]
0x5655a1a0: [ 24 bytes user data ]
Chunk B: malloc(8)
0x5655a1b8: [prev_size: 0x00] [size: 0x11]
0x5655a1c0: [ 8 bytes user data ]
Chunk C: malloc(12)
0x5655a1c8: [prev_size: 0x00] [size: 0x11]
0x5655a1d0: [ 8 bytes user data ]
Top Chunk
0x5655a1d8: [prev_size: 0x00] [size: 0x21e29]
Notice that every chunk's prev_size is zero. This field only becomes meaningful when the previous chunk is free. Since all chunks are currently allocated, glibc doesn't bother writing to it. At breakpoint 4, the strcpy calls have filled in the user data. The compiler actually optimized these into direct MOV instructions. The heap now looks like this:
(gdb) x/20wx 0x5655a198
0x5655a198: 0x00000000 0x00000021 0x6b636168 0x6e616c73
0x5655a1a8: 0x656e2e64 0x00000074 0x00000000 0x00000000
0x5655a1b8: 0x00000000 0x00000011 0x6c707865 0x0074696f
0x5655a1c8: 0x00000000 0x00000011 0x65766564 0x6d706f6c
0x5655a1d8: 0x00746e65 0x00021e29 0x00000000 0x00000000
The strings are stored in little-endian byte order. For example, chunk A contains "hacksland.net" — 0x6b636168 is "hack" read backwards byte by byte, 0x6e616c73 is "slan", and so on. Chunk B holds "exploit" and chunk C holds "development". Now the interesting part.
We continue past free(b) and examine the heap again:
(gdb) x/20wx 0x5655a198
0x5655a198: 0x00000000 0x00000021 0x6b636168 0x6e616c73
0x5655a1a8: 0x656e2e64 0x00000074 0x00000000 0x00000000
0x5655a1b8: 0x00000000 0x00000011 0xf7faf798 0xf7faf798
0x5655a1c8: 0x00000010 0x00000010 0x65766564 0x6d706f6c
0x5655a1d8: 0x00746e65 0x00021e29 0x00000000 0x00000000
Three things changed. Let's look at each one.
- Chunk B's user data was replaced by fd and bk pointers
- Before: 0x6c707865 0x0074696f ("exploit")
- After: 0xf7faf798 0xf7faf798 (fd and bk)
The string "exploit" is gone. In its place are two identical pointers: 0xf7faf798. These are the forward pointer (fd) and backward pointer (bk). They both point to the same address — the unsorted bin head inside glibc's main_arena structure. The unsorted bin is a circular doubly-linked list. Since chunk B is the only free chunk in the bin, it points both forward and backward to the list head itself.
- Chunk C's prev_size is now populated
- Before: 0x00000000
- After: 0x00000010
When glibc freed chunk B, it wrote B's chunk size (0x10) into chunk C's prev_size field. This is how glibc prepares for backward coalescing. If chunk C is freed later, glibc reads this prev_size, subtracts it from C's address, and finds chunk B's header — then merges both chunks into one larger free chunk.
- Chunk C's PREV_INUSE bit was cleared
- Before: 0x00000011 (0x10 | 0x1) → previous chunk is IN USE
- After: 0x00000010 (0x10 | 0x0) → previous chunk is FREE
This is the critical flag. glibc cleared the lowest bit of chunk C's size field. This tells any future operation: "the chunk before me is free." Without this bit, free(c) would not even attempt to look at prev_size or coalesce backwards.
Now we continue past free(c):
(gdb) x/20wx 0x5655a198 0x5655a198: 0x00000000 0x00000021 0x6b636168 0x6e616c73 0x5655a1a8: 0x656e2e64 0x00000074 0x00000000 0x00000000 0x5655a1b8: 0x00000000 0x00020e49 0xf7faf798 0xf7faf798 0x5655a1c8: 0x00000010 0x00000010 0x65766564 0x6d706f6c 0x5655a1d8: 0x00746e65 0x00021e29 0x00000000 0x00000000
Look at what changed at 0x5655a1bc — the size field. It was 0x00000011 (chunk B's size). Now it's 0x00020e49. The top chunk has swallowed chunks B and C entirely and moved backward to 0x5655a1b8. Here's what glibc did step by step: Step 1: Check backward — is the previous chunk free? glibc looks at chunk C's PREV_INUSE bit. It's 0 — the previous chunk (B) is free. So glibc reads C's prev_size (0x10), subtracts it from C's address, and walks backward to find chunk B: B's header = 0x5655a1c8 - 0x10 = 0x5655a1b8 ✓ It then unlinks chunk B from the unsorted bin — removing it from the doubly-linked list using the fd and bk pointers. The two chunks merge: merged size = B's size + C's size = 0x10 + 0x10 = 0x20 Step 2: Check forward — is the next chunk the top chunk? The chunk after C starts at 0x5655a1c8 + 0x10 = 0x5655a1d8 — that's the top chunk. When a free chunk is adjacent to the wilderness, glibc doesn't bother putting it in a bin. It absorbs everything into the top chunk: new top starts at: 0x5655a1b8 new top size = 0x20 (merged B+C) + 0x21e29 (old top) = 0x21e49 But wait — the size shows 0x20e49, not 0x21e49. That's heap trimming in action. The combined top chunk exceeded glibc's M_TRIM_THRESHOLD (default 128KB). So glibc called sbrk with a negative value to return one page (4096 = 0x1000 bytes) back to the operating system: 0x21e49 - 0x1000 = 0x20e49 ✓ The kernel reclaimed a page of memory. This is glibc being a good citizen — if a program frees a large amount of memory, it gives some back rather than hoarding it forever.