Debugging Binaries with GDB
GDB is shipped with the GNU toolset. It is a debugging tool used in Linux environments. The term GDB stands for GNU Debugger.
In our previous protostar stack0 walkthrough tutorial, we used GDB many times.
So in this post, I'm going to explain how to use a Linux debugger for debugging and analyze a binary file. If you are planning to learn reverse engineering, malware analysis, or exploit development you must be familiar with debuggers.
To understand the Disassembly and stack etc, I suggest you read the following tutorials:
- Introduction to assembly language
- Stack architecture theory
- Starting the debugging
We can open a binary inside GDB with the command gdb ./[binary_file]
. Here binary_name is the name of the file we want to debug. You may see the following screen after this command.
gdb-main-interface-on-kali
However, in general, we don't need this banner. If you think it disturbs you, you may use quiet mode. It prevents GDB from showing this welcome banner.
gdb -q ./stack0
So guys, our next step is to disassemble the binary and understand the architecture of the program.
Disassemble a binary.
There are two main Assembly syntax styles called Intel syntax and AT&T syntax. In the following image, you can see both of them.
Intel and AT&T Assembly syntax
You can select one of them as your preference. I think Intel syntax is clear and easy to understand. So in my disassembly, I prefer to use Intel Assembly syntax.
By default, GDB uses AT&T assembly syntax. We can switch to Intel assembly syntax by entering the following command.
set disassembly-flavor intel
If you want to switch back, use the command set disassembly-flavor att
.
If you feel it is boring to switch syntax every time you start GDB, you can permanently switch to Intel syntax by editing the .gdbinit
file. This file is located in your home folder. So enter the following command:
echo 'set disassembly-flavor intel' > ~/.gdbinit
After you load a binary in GDB, you can disassemble a function and see the assembly code. To do that, you can use disassemble function_name
or disas function_name
. For example, if you want to disassemble the main function, you may use disassemble main
or disas main
.
(gdb) disass main
Dump of assembler code for function main:
0x080483f4 : push ebp
0x080483f5 : mov ebp,esp
0x080483f7 : and esp,0xfffffff0
0x080483fa : sub esp,0x60
0x080483fd : mov DWORD PTR [esp+0x5c],0x0
0x08048405 : lea eax,[esp+0x1c]
0x08048409 : mov DWORD PTR [esp],eax
0x0804840c : call 0x804830c
0x08048411 : mov eax,DWORD PTR [esp+0x5c]
0x08048415 : test eax,eax
0x08048417 : je 0x8048427
0x08048419 : mov DWORD PTR [esp],0x8048500
0x08048420 : call 0x804832c
0x08048425 : jmp 0x8048433
0x08048427 : mov DWORD PTR [esp],0x8048529
0x0804842e : call 0x804832c
0x08048433 : leave
0x08048434 : ret
End of assembler dump.
On the left side, we can see memory addresses. Our CPU instructions are loaded there. On the right side, there are assembly instructions like push ebp
, mov ebp,esp
, etc. These Assembly instructions do various tasks on the CPU, memory, and registers.
Breakpoints
The breakpoint is an essential thing in debugging. We can stop the execution of the program at a decided state and examine the memory and registers. You can set a breakpoint on a function with the command break
. For example, if you want to break execution at the main function, you may use break main
or the shorthand command b main
.
Let's make a breakpoint on the above binary and run it to see what happens.
(gdb) b main
Breakpoint 1 at 0x80483fd: file stack0/stack0.c, line 10.
(gdb) run
Starting program: /opt/protostar/bin/stack0
Breakpoint 1, main (argc=1, argv=0xbffff864) at stack0
I used the b main
command. So GDB created a breakpoint at the memory address 0x80483fd
. Go to the above-disassembled code and find out what is at that address.
The instruction at this address is mov DWORD PTR [esp+0x5c],0x0
. So GDB has skipped the following instructions:
0x080483f4 : push ebp
0x080483f5 : mov ebp,esp
0x080483f7 : and esp,0xfffffff0
0x080483fa : sub esp,0x60
This is because these instructions are related to the function prologue generated by the compiler. The function prologue builds the stack frame of the function.
When you set a breakpoint with the function name, GDB automatically skips the function prologue. So if you want to see how the stack frame is built, you can use a memory address instead of the function name.
Let's set a breakpoint at the top of the assembly instructions.
(gdb) b *0x080483f4
Breakpoint 2 at 0x80483f4: file stack0/stack0.c, line 6.
Notice the star mark before the memory address.
Examine the memory and the registers
This is the most important part of our reverse engineering task. We can use various ways to examine the memory and the registers to see what is inside them.
Examine registers
To examine registers, we must run the program. What we do is set a breakpoint at a required state and run the program. After GDB stops the execution, we can examine registers.
Now I have created a breakpoint at the main function using b main
and started the program. So at the moment, GDB has paused the execution at the main function.
We can use the info registers
command or the shorthand command i r
to examine all registers. See the following example:
(gdb) i r
eax 0xbffff864 -1073743772
ecx 0xa9493a07 -1454818809
edx 0x1 1
ebx 0xb7fd7ff4 -1208123404
esp 0xbffff750 0xbffff750
ebp 0xbffff7b8 0xbffff7b8
esi 0x0 0
edi 0x0 0
eip 0x80483fd 0x80483fd
eflags 0x200286 [ PF SF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
So, guys, GDB listed all registers and their current values.
Also, we can examine a specific register using their name with the command i r [register_name]
. Let's see what is inside the esp
register.
(gdb) i r esp
esp 0xbffff750 0xbffff750
The esp
register holds the top address of the stack. The ebp
register holds the base address of the stack. And the eip
register holds the address of the next instruction to execute.
Examine memory addresses
Let's examine what is inside the address held by the eip
register.
(gdb) x $eip
0x80483fd : 0x60ec8b55
The x command shows the content of the memory address. This content might be an instruction or data. By default, GDB shows the memory content in hexadecimal and as 4 bytes at a time.
The x command has various formats to show the memory contents. Some of them are:
x
: hexadecimalo
: octalu
: unsigned decimalt
: binaryd
: decimal
So, to view a value in binary, you can use the t
format.
x/t $eip
You may want to see a number of addresses starting from a base address. In that case, you may use the format x/[number][format][unit]
[address]
.
In this format, you can see a number of memory addresses and their contents. For example, to check 20 words from the top of the stack, use:
x/20x $esp
The above command lists 20 memory addresses from the top of the stack and their contents in hexadecimal format.
Run, Continue and Step
The run command starts the execution of the program. Sometimes, we may want to test a program with various arguments. We can provide arguments with the run command.
(gdb) run AAAAAA
The above command passes AAAAAA
as a command-line argument to the program and starts its execution. This is a useful technique to check buffer overflows.
If a breakpoint is hit, we may continue the execution by using the continue
command or the shorthand command c
.
The ni
command is used to step through one CPU instruction at a time. After each instruction, GDB will pause the execution.
Explorer the world of cyber security. Read some cool articles on System exploitation, Web application hacking, exploit development, malwara analysis, Cryptography etc.