GDB Reverse Engineering Tutorial
Thilan Dissanayaka Exploit Development January 11, 2020

GDB Reverse Engineering Tutorial

Here’s a challenge. You have a compiled binary. No source code. No documentation. No debug symbols. All you have is the executable file. Can you figure out what it does — not just by running it, but by understanding the logic inside it, instruction by instruction?

That’s reverse engineering. And GDB (the GNU Debugger) is one of the best tools for doing it on Linux. In this tutorial, we’ll take a simple binary, disassemble it, step through it in GDB, and reconstruct the original C source code from nothing but assembly.

What is GDB?

GDB is a command-line debugger for Linux. It lets you:

  • Disassemble compiled code back into assembly instructions
  • Set breakpoints to pause execution at specific instructions
  • Step through code one instruction at a time
  • Inspect registers and memory to see values as they change
  • Examine the stack to understand function calls and local variables

It’s the go-to tool for exploit developers, reverse engineers, and anyone who needs to understand what a binary is actually doing under the hood. Think of it as a microscope for programs.

The Target Binary

Let’s start by running our mystery binary to see what it does:

user@protostar:~$ ./rev
HacksLand
user@protostar:~$

It prints the string “HacksLand” and exits. Simple enough. Based on this output, we might guess the source code looks something like:

#include <stdio.h>

int main() {
    printf("HacksLand\n");
    return 0;
}

But is it really that simple? Let’s find out. There might be variables, conditions, loops — things that don’t show up in the output but are part of the program’s logic. The only way to know for sure is to disassemble it.

Loading the Binary in GDB

user@protostar:~$ gdb -q ./rev
Reading symbols from /home/user/rev...(no debugging symbols found)...done.

The -q flag (quiet) suppresses GDB’s startup banner. The “(no debugging symbols found)” message tells us the binary was compiled without debug info — we can’t see variable names or line numbers. That’s typical in real-world reverse engineering.

First, let’s switch to Intel syntax. GDB defaults to AT&T syntax, but Intel syntax is cleaner and what most reverse engineers prefer:

(gdb) set disassembly-flavor intel

Now let’s disassemble the main function:

(gdb) disass main

Dump of assembler code for function main:
0x080483c4 <+0>:     push   ebp
0x080483c5 <+1>:     mov    ebp,esp
0x080483c7 <+3>:     sub    esp,0x10
0x080483ca <+6>:     mov    DWORD PTR [ebp-0xc],0x2
0x080483d1 <+13>:    mov    DWORD PTR [ebp-0x8],0x3
0x080483d8 <+20>:    mov    eax,DWORD PTR [ebp-0x8]
0x080483db <+23>:    mov    edx,DWORD PTR [ebp-0xc]
0x080483de <+26>:    lea    eax,[edx+eax*1]
0x080483e1 <+29>:    mov    DWORD PTR [ebp-0x4],eax
0x080483e4 <+32>:    cmp    DWORD PTR [ebp-0x4],0x7
0x080483e8 <+36>:    jg     0x80483f8 <main+52>
0x080483ea <+38>:    mov    DWORD PTR [esp],0x80484d0
0x080483f1 <+45>:    call   0x80482f8 <puts@plt>
0x080483f6 <+50>:    jmp    0x8048404 <main+64>
0x080483f8 <+52>:    mov    DWORD PTR [esp],0x80484da
0x080483ff <+59>:    call   0x80482f8 <puts@plt>
0x08048404 <+64>:    mov    eax,0x0
0x08048409 <+69>:    leave
0x0804840a <+70>:    ret
End of assembler dump.

That’s 17 instructions. Looks intimidating at first, but we’re going to go through every single one. By the end, you’ll read this as naturally as C code.

The Function Prologue — Setting Up the Stack Frame

The first three instructions appear at the beginning of virtually every function. They’re called the function prologue:

0x080483c4 <+0>:     push   ebp
0x080483c5 <+1>:     mov    ebp,esp
0x080483c7 <+3>:     sub    esp,0x10

push ebp — Saves the caller’s base pointer on the stack. We need to restore it later when this function returns.

mov ebp, esp — Sets up the new base pointer. From this point on, EBP is our fixed reference point for accessing local variables and function arguments. Everything on the stack is accessed relative to EBP.

sub esp, 0x10 — Allocates 16 bytes (0x10) of space on the stack for local variables. Since the stack grows downward (toward lower addresses), subtracting from ESP reserves space.

After the prologue, the stack looks like this:

Higher Addresses
┌──────────────────────────┐
│  Return address           │  (pushed by CALL instruction)
├──────────────────────────┤
│  Saved EBP (old)          │  ◄── pushed by "push ebp"
├──────────────────────────┤  ◄── EBP points here
│  [ebp-0x4]  (4 bytes)    │  ← local variable (z)
├──────────────────────────┤
│  [ebp-0x8]  (4 bytes)    │  ← local variable (y)
├──────────────────────────┤
│  [ebp-0xc]  (4 bytes)    │  ← local variable (x)
├──────────────────────────┤
│  [ebp-0x10] (4 bytes)    │  ← padding / alignment
├──────────────────────────┤  ◄── ESP points here
Lower Addresses

Three local variables at [ebp-0x4], [ebp-0x8], and [ebp-0xc]. Let’s see what gets stored in them.

Initializing Variables

0x080483ca <+6>:     mov    DWORD PTR [ebp-0xc],0x2
0x080483d1 <+13>:    mov    DWORD PTR [ebp-0x8],0x3

These two instructions store values into the local variable space on the stack:

  • [ebp-0xc] gets the value 2 (0x2)
  • [ebp-0x8] gets the value 3 (0x3)

DWORD PTR means we’re writing a 4-byte (32-bit) value — so these are int variables.

In C terms, this is:

int x = 2;
int y = 3;

Let’s verify in GDB. We’ll set a breakpoint before these instructions execute, then step through and watch the values change:

(gdb) b *0x080483ca
Breakpoint 1 at 0x80483ca

(gdb) run
Starting program: /home/user/rev
Breakpoint 1, 0x080483ca in main ()

Now let’s examine the memory at [ebp-0xc] before the instruction runs:

(gdb) x/x $ebp-0xc
0xbffff7dc:	0xb7fd7ff4

Garbage value — uninitialized stack memory. Now step one instruction and check again:

(gdb) ni
0x080483d1 in main ()

(gdb) x/x $ebp-0xc
0xbffff7dc:	0x00000002

The value at [ebp-0xc] is now 2. Let’s do the same for [ebp-0x8]:

(gdb) x/x $ebp-0x8
0xbffff7e0:	0x08048420

(gdb) ni
0x080483d8 in main ()

(gdb) x/x $ebp-0x8
0xbffff7e0:	0x00000003

Now [ebp-0x8] contains 3. Exactly as expected. Two local integer variables initialized to 2 and 3.

The Addition

0x080483d8 <+20>:    mov    eax,DWORD PTR [ebp-0x8]
0x080483db <+23>:    mov    edx,DWORD PTR [ebp-0xc]
0x080483de <+26>:    lea    eax,[edx+eax*1]
0x080483e1 <+29>:    mov    DWORD PTR [ebp-0x4],eax

The first two instructions load our variables from the stack into registers:

  • EAX = value at [ebp-0x8] = 3
  • EDX = value at [ebp-0xc] = 2

You might wonder — why not operate on the stack values directly? Why copy them into registers first? Because that’s how CPUs work. Most arithmetic operations happen between registers, not between memory locations. The compiler generates code to load values into registers, operate on them, then store the result back.

Now the interesting instruction:

lea    eax,[edx+eax*1]

LEA stands for Load Effective Address. Despite its name, it’s commonly used by compilers as a fast way to do arithmetic. [edx+eax*1] calculates EDX + EAX × 1 = EDX + EAX = 2 + 3 = 5. The result is stored in EAX.

Let’s verify:

(gdb) i r eax edx
eax            0x3     3
edx            0x2     2

(gdb) ni
0x080483e1 in main ()

(gdb) i r eax
eax            0x5     5

3 + 2 = 5. The i r command (short for info registers) shows register values.

Then mov DWORD PTR [ebp-0x4], eax stores the result (5) into the third local variable at [ebp-0x4].

So far, our reconstructed code is:

int x = 2;
int y = 3;
int z;
z = x + y;    // z = 5

The Comparison and Conditional Jump

Now things get interesting:

0x080483e4 <+32>:    cmp    DWORD PTR [ebp-0x4],0x7
0x080483e8 <+36>:    jg     0x80483f8 <main+52>

cmp DWORD PTR [ebp-0x4], 0x7 — Compares the value at [ebp-0x4] (which is 5) with 7. Internally, CMP performs a subtraction (5 - 7 = -2) without storing the result, but it sets the flags register based on the outcome:

  • Zero Flag (ZF) = 0 (result isn’t zero, so they’re not equal)
  • Sign Flag (SF) = 1 (result is negative, so the first operand is smaller)

jg 0x80483f8 — “Jump if Greater.” This jumps to address 0x80483f8 only if the first operand of the previous CMP was greater than the second. Is 5 > 7? No. So the jump is not taken, and execution falls through to the next instruction.

This is an if statement. The CMP + JG combination translates to:

if (z > 7) {
    // jump target — the "else" branch (at 0x80483f8)
} else {
    // fall through — the "if true" branch (next instruction)
}

Wait — that seems backwards. Why does the JG jump to what turns out to be the “else” branch? This is a common pattern in compiled code. The compiler inverts the condition. Instead of:

if (z <= 7) { do_A; } else { do_B; }

It generates:

cmp z, 7
jg  do_B        ; if z > 7, skip to B
do_A             ; otherwise, fall through to A
jmp end          ; skip over B
do_B:
end:

The condition is flipped so that the “true” branch is the fall-through path. This is a standard compiler optimization — falling through is slightly faster than jumping.

The Two Branches

Now let’s look at both paths.

The “if” Branch (z <= 7 — falls through)

0x080483ea <+38>:    mov    DWORD PTR [esp],0x80484d0
0x080483f1 <+45>:    call   0x80482f8 <puts@plt>
0x080483f6 <+50>:    jmp    0x8048404 <main+64>

mov DWORD PTR [esp], 0x80484d0 — Places the address 0x80484d0 at the top of the stack. This is the argument to the function call that follows. On 32-bit x86 Linux, function arguments are passed on the stack.

What’s at address 0x80484d0? Let’s check in GDB:

(gdb) x/s 0x80484d0
0x80484d0:	"HacksLand"

There it is — the string “HacksLand” sitting in the .rodata (read-only data) section.

call 0x80482f8 <puts@plt> — Calls the puts function (not printf). The compiler often optimizes printf("string\n") into puts("string") when there are no format specifiers — puts automatically appends a newline and is faster.

jmp 0x8048404 <main+64> — After printing, this unconditional jump skips over the else branch and goes directly to the function epilogue. Without this jump, execution would fall through into the else branch and print both strings.

The “else” Branch (z > 7 — jump target)

0x080483f8 <+52>:    mov    DWORD PTR [esp],0x80484da
0x080483ff <+59>:    call   0x80482f8 <puts@plt>

Same pattern — load a string address and call puts. But what string is at 0x80484da?

(gdb) x/s 0x80484da
0x80484da:	"HacksLand Overflow"

A different string! So there’s an else branch we never saw when running the program, because the condition z > 7 (5 > 7) was false.

The Function Epilogue

0x08048404 <+64>:    mov    eax,0x0
0x08048409 <+69>:    leave
0x0804840a <+70>:    ret

mov eax, 0x0 — Sets the return value to 0. In the C calling convention, the return value of a function goes in EAX. This is return 0;.

leave — This is shorthand for mov esp, ebp followed by pop ebp. It undoes the function prologue — restores the stack pointer and the caller’s base pointer.

ret — Pops the return address from the stack and jumps to it, returning control to whichever function called main.

The Reconstructed Source Code

Putting it all together, we can now reconstruct the original C source with confidence:

#include <stdio.h>

int main() {
    int x = 2;
    int y = 3;
    int z;

    z = x + y;    // z = 5

    if (z > 7) {
        printf("HacksLand Overflow\n");
    } else {
        printf("HacksLand\n");
    }

    return 0;
}

When we ran the binary, we only saw “HacksLand” — the else branch. We had no idea about the condition, the math, or the alternative output. But by reading the assembly instruction by instruction, we reconstructed the complete program logic, including a branch that never executed.

That’s the power of reverse engineering.

GDB Commands Quick Reference

Here’s a summary of every GDB command we used, plus a few more that are essential for reverse engineering:

Command Short What It Does
disass main disas main Disassemble a function
disass 0x08048060, 0x08048080   Disassemble an address range
break *0x080483ca b *0x080483ca Set breakpoint at an address
run r Start the program
continue c Continue after a breakpoint
nexti ni Step one instruction (skip over calls)
stepi si Step one instruction (step into calls)

Inspection

Command Short What It Does
info registers i r Show all registers
i r eax edx   Show specific registers
x/x $ebp-0xc   Examine memory as hex
x/s 0x80484d0   Examine memory as string
x/10i $eip   Disassemble 10 instructions at EIP
x/20xw $esp   Examine 20 words at the stack pointer

The x Command Formats

The x (examine) command is the most versatile tool in GDB. The format is x/NFU address:

  • N = number of units to display
  • F = format: x (hex), d (decimal), s (string), i (instruction), c (char)
  • U = unit size: b (byte), h (halfword/2 bytes), w (word/4 bytes), g (giant/8 bytes)
(gdb) x/4xw $esp         — 4 hex words at ESP (the top of the stack)
(gdb) x/16xb $eax        — 16 hex bytes at the address in EAX
(gdb) x/s 0x80484d0      — string at that address
(gdb) x/10i $eip         — next 10 instructions
(gdb) x/1xg $rsp         — 1 hex giant (8 bytes) at RSP (64-bit)

Configuration

Command What It Does
set disassembly-flavor intel Switch to Intel syntax
set pagination off Disable the “press enter” pager
layout asm TUI mode — shows disassembly in a split view
layout regs TUI mode — shows registers alongside disassembly

What to Try Next

This was a simple binary with straightforward logic. To level up your reverse engineering skills, try these:

  • Loops — Disassemble a binary with for or while loops. Look for cmp + jl/jle + backward jumps (jumps to lower addresses indicate loops)
  • Function calls — Reverse a binary that calls multiple functions. Trace the arguments on the stack and the return values in EAX
  • Strings and arrays — Reverse a binary that iterates over a string or array. Watch how pointer arithmetic works in assembly
  • Structs — Reverse a binary with struct access. You’ll see base+offset patterns like [eax+0x8]
  • Switch statements — These often compile into jump tables. Disassemble one and figure out how the cases are dispatched

The best way to practice is to write a small C program, compile it without optimizations (gcc -O0 -m32 -o program program.c), and then reverse it in GDB without looking at the source. Compare your reconstruction against the original to check your work.

Happy reversing!

ALSO READ
Blockchain 0x000 – Understanding the Fundamentals
May 21, 2020 Web3 Development

Imagine a world where strangers can exchange money, share data, or execute agreements without ever needing to trust a central authority. No banks, no intermediaries, no single point of failure yet...

Identity and Access Management (IAM)
May 11, 2020 Identity & Access Management

Who are you — and what are you allowed to do? That's the fundamental question every secure system must answer. And it's exactly what Identity and Access Management (IAM) is built to solve.

How I built a web based CPU Simulator
May 07, 2020 Pet Projects

As someone passionate about computer engineering, reverse engineering, and system internals, I've always been fascinated by what happens "under the hood" of a computer. This curiosity led me to...

Writing a Shell Code for Linux
Apr 21, 2020 Exploit Development

Shellcode is a small piece of machine code used as the payload in exploit development. In this post, we write Linux shellcode from scratch — starting with a simple exit, building up to spawning a shell, and explaining every decision along the way.

Exploiting a Stack Buffer Overflow on Windows
Apr 12, 2020 Exploit Development

In a previous tutorial we discusses how we can exploit a buffer overflow vulnerability on a Linux machine. I wen through all theories in depth and explained each step. Now today we are going to jump...

Access Control Models
Apr 08, 2020 Identity & Access Management

Access control is one of the most fundamental concepts in security. Every time you set file permissions, assign user roles, or restrict access to a resource, you're implementing some form of access control. But not all access control is created equal...

Exploiting a  Stack Buffer Overflow  on Linux
Apr 01, 2020 Exploit Development

Have you ever wondered how attackers gain control over remote servers? How do they just run some exploit and compromise a computer? If we dive into the actual context, there is no magic happening....

Basic concepts of Cryptography
Mar 01, 2020 Cryptography

Ever notice that little padlock icon in your browser's address bar? That's cryptography working silently in the background, protecting everything you do online. Whether you're sending an email,...

Common Web Application Attacks
Feb 05, 2020 Application Security

Web applications are one of the most targeted surfaces by attackers. This is primarily because they are accessible over the internet, making them exposed and potentially vulnerable. Since these...

Remote Code Execution (RCE)
Jan 02, 2020 Application Security

Remote Code Execution (RCE) is the holy grail of application security vulnerabilities. It allows an attacker to execute arbitrary code on a remote server — and the consequences are as bad as it sounds. In this post, we'll go deep into RCE across multiple languages, including PHP, Java, Python, and Node.js.