GDB Reverse Engineering Tutorial
Here’s a challenge. You have a compiled binary. No source code. No documentation. No debug symbols. All you have is the executable file. Can you figure out what it does — not just by running it, but by understanding the logic inside it, instruction by instruction?
That’s reverse engineering. And GDB (the GNU Debugger) is one of the best tools for doing it on Linux. In this tutorial, we’ll take a simple binary, disassemble it, step through it in GDB, and reconstruct the original C source code from nothing but assembly.
What is GDB?
GDB is a command-line debugger for Linux. It lets you:
- Disassemble compiled code back into assembly instructions
- Set breakpoints to pause execution at specific instructions
- Step through code one instruction at a time
- Inspect registers and memory to see values as they change
- Examine the stack to understand function calls and local variables
It’s the go-to tool for exploit developers, reverse engineers, and anyone who needs to understand what a binary is actually doing under the hood. Think of it as a microscope for programs.
The Target Binary
Let’s start by running our mystery binary to see what it does:
user@protostar:~$ ./rev
HacksLand
user@protostar:~$
It prints the string “HacksLand” and exits. Simple enough. Based on this output, we might guess the source code looks something like:
#include <stdio.h>
int main() {
printf("HacksLand\n");
return 0;
}
But is it really that simple? Let’s find out. There might be variables, conditions, loops — things that don’t show up in the output but are part of the program’s logic. The only way to know for sure is to disassemble it.
Loading the Binary in GDB
user@protostar:~$ gdb -q ./rev
Reading symbols from /home/user/rev...(no debugging symbols found)...done.
The -q flag (quiet) suppresses GDB’s startup banner. The “(no debugging symbols found)” message tells us the binary was compiled without debug info — we can’t see variable names or line numbers. That’s typical in real-world reverse engineering.
First, let’s switch to Intel syntax. GDB defaults to AT&T syntax, but Intel syntax is cleaner and what most reverse engineers prefer:
(gdb) set disassembly-flavor intel
Now let’s disassemble the main function:
(gdb) disass main
Dump of assembler code for function main:
0x080483c4 <+0>: push ebp
0x080483c5 <+1>: mov ebp,esp
0x080483c7 <+3>: sub esp,0x10
0x080483ca <+6>: mov DWORD PTR [ebp-0xc],0x2
0x080483d1 <+13>: mov DWORD PTR [ebp-0x8],0x3
0x080483d8 <+20>: mov eax,DWORD PTR [ebp-0x8]
0x080483db <+23>: mov edx,DWORD PTR [ebp-0xc]
0x080483de <+26>: lea eax,[edx+eax*1]
0x080483e1 <+29>: mov DWORD PTR [ebp-0x4],eax
0x080483e4 <+32>: cmp DWORD PTR [ebp-0x4],0x7
0x080483e8 <+36>: jg 0x80483f8 <main+52>
0x080483ea <+38>: mov DWORD PTR [esp],0x80484d0
0x080483f1 <+45>: call 0x80482f8 <puts@plt>
0x080483f6 <+50>: jmp 0x8048404 <main+64>
0x080483f8 <+52>: mov DWORD PTR [esp],0x80484da
0x080483ff <+59>: call 0x80482f8 <puts@plt>
0x08048404 <+64>: mov eax,0x0
0x08048409 <+69>: leave
0x0804840a <+70>: ret
End of assembler dump.
That’s 17 instructions. Looks intimidating at first, but we’re going to go through every single one. By the end, you’ll read this as naturally as C code.
The Function Prologue — Setting Up the Stack Frame
The first three instructions appear at the beginning of virtually every function. They’re called the function prologue:
0x080483c4 <+0>: push ebp
0x080483c5 <+1>: mov ebp,esp
0x080483c7 <+3>: sub esp,0x10
push ebp — Saves the caller’s base pointer on the stack. We need to restore it later when this function returns.
mov ebp, esp — Sets up the new base pointer. From this point on, EBP is our fixed reference point for accessing local variables and function arguments. Everything on the stack is accessed relative to EBP.
sub esp, 0x10 — Allocates 16 bytes (0x10) of space on the stack for local variables. Since the stack grows downward (toward lower addresses), subtracting from ESP reserves space.
After the prologue, the stack looks like this:
Higher Addresses
┌──────────────────────────┐
│ Return address │ (pushed by CALL instruction)
├──────────────────────────┤
│ Saved EBP (old) │ ◄── pushed by "push ebp"
├──────────────────────────┤ ◄── EBP points here
│ [ebp-0x4] (4 bytes) │ ← local variable (z)
├──────────────────────────┤
│ [ebp-0x8] (4 bytes) │ ← local variable (y)
├──────────────────────────┤
│ [ebp-0xc] (4 bytes) │ ← local variable (x)
├──────────────────────────┤
│ [ebp-0x10] (4 bytes) │ ← padding / alignment
├──────────────────────────┤ ◄── ESP points here
Lower Addresses
Three local variables at [ebp-0x4], [ebp-0x8], and [ebp-0xc]. Let’s see what gets stored in them.
Initializing Variables
0x080483ca <+6>: mov DWORD PTR [ebp-0xc],0x2
0x080483d1 <+13>: mov DWORD PTR [ebp-0x8],0x3
These two instructions store values into the local variable space on the stack:
[ebp-0xc]gets the value 2 (0x2)[ebp-0x8]gets the value 3 (0x3)
DWORD PTR means we’re writing a 4-byte (32-bit) value — so these are int variables.
In C terms, this is:
int x = 2;
int y = 3;
Let’s verify in GDB. We’ll set a breakpoint before these instructions execute, then step through and watch the values change:
(gdb) b *0x080483ca
Breakpoint 1 at 0x80483ca
(gdb) run
Starting program: /home/user/rev
Breakpoint 1, 0x080483ca in main ()
Now let’s examine the memory at [ebp-0xc] before the instruction runs:
(gdb) x/x $ebp-0xc
0xbffff7dc: 0xb7fd7ff4
Garbage value — uninitialized stack memory. Now step one instruction and check again:
(gdb) ni
0x080483d1 in main ()
(gdb) x/x $ebp-0xc
0xbffff7dc: 0x00000002
The value at [ebp-0xc] is now 2. Let’s do the same for [ebp-0x8]:
(gdb) x/x $ebp-0x8
0xbffff7e0: 0x08048420
(gdb) ni
0x080483d8 in main ()
(gdb) x/x $ebp-0x8
0xbffff7e0: 0x00000003
Now [ebp-0x8] contains 3. Exactly as expected. Two local integer variables initialized to 2 and 3.
The Addition
0x080483d8 <+20>: mov eax,DWORD PTR [ebp-0x8]
0x080483db <+23>: mov edx,DWORD PTR [ebp-0xc]
0x080483de <+26>: lea eax,[edx+eax*1]
0x080483e1 <+29>: mov DWORD PTR [ebp-0x4],eax
The first two instructions load our variables from the stack into registers:
EAX= value at[ebp-0x8]= 3EDX= value at[ebp-0xc]= 2
You might wonder — why not operate on the stack values directly? Why copy them into registers first? Because that’s how CPUs work. Most arithmetic operations happen between registers, not between memory locations. The compiler generates code to load values into registers, operate on them, then store the result back.
Now the interesting instruction:
lea eax,[edx+eax*1]
LEA stands for Load Effective Address. Despite its name, it’s commonly used by compilers as a fast way to do arithmetic. [edx+eax*1] calculates EDX + EAX × 1 = EDX + EAX = 2 + 3 = 5. The result is stored in EAX.
Let’s verify:
(gdb) i r eax edx
eax 0x3 3
edx 0x2 2
(gdb) ni
0x080483e1 in main ()
(gdb) i r eax
eax 0x5 5
3 + 2 = 5. The i r command (short for info registers) shows register values.
Then mov DWORD PTR [ebp-0x4], eax stores the result (5) into the third local variable at [ebp-0x4].
So far, our reconstructed code is:
int x = 2;
int y = 3;
int z;
z = x + y; // z = 5
The Comparison and Conditional Jump
Now things get interesting:
0x080483e4 <+32>: cmp DWORD PTR [ebp-0x4],0x7
0x080483e8 <+36>: jg 0x80483f8 <main+52>
cmp DWORD PTR [ebp-0x4], 0x7 — Compares the value at [ebp-0x4] (which is 5) with 7. Internally, CMP performs a subtraction (5 - 7 = -2) without storing the result, but it sets the flags register based on the outcome:
- Zero Flag (ZF) = 0 (result isn’t zero, so they’re not equal)
- Sign Flag (SF) = 1 (result is negative, so the first operand is smaller)
jg 0x80483f8 — “Jump if Greater.” This jumps to address 0x80483f8 only if the first operand of the previous CMP was greater than the second. Is 5 > 7? No. So the jump is not taken, and execution falls through to the next instruction.
This is an if statement. The CMP + JG combination translates to:
if (z > 7) {
// jump target — the "else" branch (at 0x80483f8)
} else {
// fall through — the "if true" branch (next instruction)
}
Wait — that seems backwards. Why does the JG jump to what turns out to be the “else” branch? This is a common pattern in compiled code. The compiler inverts the condition. Instead of:
if (z <= 7) { do_A; } else { do_B; }
It generates:
cmp z, 7
jg do_B ; if z > 7, skip to B
do_A ; otherwise, fall through to A
jmp end ; skip over B
do_B:
end:
The condition is flipped so that the “true” branch is the fall-through path. This is a standard compiler optimization — falling through is slightly faster than jumping.
The Two Branches
Now let’s look at both paths.
The “if” Branch (z <= 7 — falls through)
0x080483ea <+38>: mov DWORD PTR [esp],0x80484d0
0x080483f1 <+45>: call 0x80482f8 <puts@plt>
0x080483f6 <+50>: jmp 0x8048404 <main+64>
mov DWORD PTR [esp], 0x80484d0 — Places the address 0x80484d0 at the top of the stack. This is the argument to the function call that follows. On 32-bit x86 Linux, function arguments are passed on the stack.
What’s at address 0x80484d0? Let’s check in GDB:
(gdb) x/s 0x80484d0
0x80484d0: "HacksLand"
There it is — the string “HacksLand” sitting in the .rodata (read-only data) section.
call 0x80482f8 <puts@plt> — Calls the puts function (not printf). The compiler often optimizes printf("string\n") into puts("string") when there are no format specifiers — puts automatically appends a newline and is faster.
jmp 0x8048404 <main+64> — After printing, this unconditional jump skips over the else branch and goes directly to the function epilogue. Without this jump, execution would fall through into the else branch and print both strings.
The “else” Branch (z > 7 — jump target)
0x080483f8 <+52>: mov DWORD PTR [esp],0x80484da
0x080483ff <+59>: call 0x80482f8 <puts@plt>
Same pattern — load a string address and call puts. But what string is at 0x80484da?
(gdb) x/s 0x80484da
0x80484da: "HacksLand Overflow"
A different string! So there’s an else branch we never saw when running the program, because the condition z > 7 (5 > 7) was false.
The Function Epilogue
0x08048404 <+64>: mov eax,0x0
0x08048409 <+69>: leave
0x0804840a <+70>: ret
mov eax, 0x0 — Sets the return value to 0. In the C calling convention, the return value of a function goes in EAX. This is return 0;.
leave — This is shorthand for mov esp, ebp followed by pop ebp. It undoes the function prologue — restores the stack pointer and the caller’s base pointer.
ret — Pops the return address from the stack and jumps to it, returning control to whichever function called main.
The Reconstructed Source Code
Putting it all together, we can now reconstruct the original C source with confidence:
#include <stdio.h>
int main() {
int x = 2;
int y = 3;
int z;
z = x + y; // z = 5
if (z > 7) {
printf("HacksLand Overflow\n");
} else {
printf("HacksLand\n");
}
return 0;
}
When we ran the binary, we only saw “HacksLand” — the else branch. We had no idea about the condition, the math, or the alternative output. But by reading the assembly instruction by instruction, we reconstructed the complete program logic, including a branch that never executed.
That’s the power of reverse engineering.
GDB Commands Quick Reference
Here’s a summary of every GDB command we used, plus a few more that are essential for reverse engineering:
Navigation
| Command | Short | What It Does |
|---|---|---|
disass main |
disas main |
Disassemble a function |
disass 0x08048060, 0x08048080 |
Disassemble an address range | |
break *0x080483ca |
b *0x080483ca |
Set breakpoint at an address |
run |
r |
Start the program |
continue |
c |
Continue after a breakpoint |
nexti |
ni |
Step one instruction (skip over calls) |
stepi |
si |
Step one instruction (step into calls) |
Inspection
| Command | Short | What It Does |
|---|---|---|
info registers |
i r |
Show all registers |
i r eax edx |
Show specific registers | |
x/x $ebp-0xc |
Examine memory as hex | |
x/s 0x80484d0 |
Examine memory as string | |
x/10i $eip |
Disassemble 10 instructions at EIP | |
x/20xw $esp |
Examine 20 words at the stack pointer |
The x Command Formats
The x (examine) command is the most versatile tool in GDB. The format is x/NFU address:
- N = number of units to display
- F = format:
x(hex),d(decimal),s(string),i(instruction),c(char) - U = unit size:
b(byte),h(halfword/2 bytes),w(word/4 bytes),g(giant/8 bytes)
(gdb) x/4xw $esp — 4 hex words at ESP (the top of the stack)
(gdb) x/16xb $eax — 16 hex bytes at the address in EAX
(gdb) x/s 0x80484d0 — string at that address
(gdb) x/10i $eip — next 10 instructions
(gdb) x/1xg $rsp — 1 hex giant (8 bytes) at RSP (64-bit)
Configuration
| Command | What It Does |
|---|---|
set disassembly-flavor intel |
Switch to Intel syntax |
set pagination off |
Disable the “press enter” pager |
layout asm |
TUI mode — shows disassembly in a split view |
layout regs |
TUI mode — shows registers alongside disassembly |
What to Try Next
This was a simple binary with straightforward logic. To level up your reverse engineering skills, try these:
- Loops — Disassemble a binary with
fororwhileloops. Look forcmp+jl/jle+ backward jumps (jumps to lower addresses indicate loops) - Function calls — Reverse a binary that calls multiple functions. Trace the arguments on the stack and the return values in
EAX - Strings and arrays — Reverse a binary that iterates over a string or array. Watch how pointer arithmetic works in assembly
- Structs — Reverse a binary with struct access. You’ll see base+offset patterns like
[eax+0x8] - Switch statements — These often compile into jump tables. Disassemble one and figure out how the cases are dispatched
The best way to practice is to write a small C program, compile it without optimizations (gcc -O0 -m32 -o program program.c), and then reverse it in GDB without looking at the source. Compare your reconstruction against the original to check your work.
Happy reversing!