Jun 19, 2020

Debugging Binaries with GDB

GDB is shipped with the GNU toolset. It is a debugging tool used in Linux environments. The term GDB stands for GNU Debugger.

In our previous protostar stack0 walkthrough tutorial, we used GDB many times. 

So in this post, I'm going to explain how to use a Linux debugger for debugging and analyze a binary file. If you are planning to learn reverse engineering, malware analysis, or exploit development you must be familiar with debuggers. 

To understand the Disassembly and stack etc, I suggest you read following tutorials

Starting the debugging.

We can open a binary inside GDB with the command gdb ./[binary_file]. Here binary_name is the name of the file we want to debug. You may see the following screen after this command.

gdb-main-interface-on-kali

However in general we don't need this banner. If you think it disturbs you, you may use quiet mode. It prevents GDB from showing this welcome banner. 

gdb -q  ./stack0

So guys our next step is to disassemble the binary and understand the architecture of the program.

Disassemble a binary.

There are two main Assembly syntax styles called Intel syntax and AT&T syntax. In the following image, you can see both of them.

Intel and at&t Assembly syntax

You can select one of them as your preference. I think Intel syntax is clear and easy to understand. So in my disassembled, I prefer to use Intel Assembly syntax.

By default, gdb uses AT&T assembly syntax. We can switch to Intel assembly syntax by entering the following command.

set disassembly-flavor intel

If you want to switch back use the command set disassembly-flavor intel

If you feel it is bearing to switch to syntax every time you start GDB, you can permanently switch to Intel syntax by editing the gdbinit file. This file is located in your home folder. So enter the following command.

echo 'set disassembly-flavor intel' > ~/.gdbinit

After you load a binary in GDB you can disassemble a function and see how is the assembly code. To do that you can use disassemble function_name or disas function_name. For example, if you want to disassemble the main function you may use disassemble main or disas main.

(gdb) disass main
Dump of assembler code for function main:
0x080483f4 <main+0>:    push   ebp
0x080483f5 <main+1>:    mov    ebp,esp
0x080483f7 <main+3>:    and    esp,0xfffffff0
0x080483fa <main+6>:    sub    esp,0x60
0x080483fd <main+9>:    mov    DWORD PTR [esp+0x5c],0x0
0x08048405 <main+17>:   lea    eax,[esp+0x1c]
0x08048409 <main+21>:   mov    DWORD PTR [esp],eax
0x0804840c <main+24>:   call   0x804830c <[email protected]>
0x08048411 <main+29>:   mov    eax,DWORD PTR [esp+0x5c]
0x08048415 <main+33>:   test   eax,eax
0x08048417 <main+35>:   je     0x8048427 <main+51>
0x08048419 <main+37>:   mov    DWORD PTR [esp],0x8048500
0x08048420 <main+44>:   call   0x804832c <[email protected]>
0x08048425 <main+49>:   jmp    0x8048433 <main+63>
0x08048427 <main+51>:   mov    DWORD PTR [esp],0x8048529
0x0804842e <main+58>:   call   0x804832c <[email protected]>
0x08048433 <main+63>:   leave
0x08048434 <main+64>:   ret
End of assembler dump.

On the left side, we can see memory addresses. Our CPU instructions are loaded in there. On the right side, there are assembly instructions like push ebp, mov ebp,esp, etc. These Assembly instructions do various tasks on CPU, memory, and registers.

breakpoints

The breakpoint is an essential thing in debugging. We can stop the execution of the program on a decided state and examine the memory and registers. You can set a breakpoint on a function with the command break. For example, if you want to break execution at the main function you may use break main or the shorthand command b main.

Let's make a breakpoint on the above binary and run it to see what happens.

(gdb) b main
Breakpoint 1 at 0x80483fd: file stack0/stack0.c, line 10.
(gdb) run
Starting program: /opt/protostar/bin/stack0

Breakpoint 1, main (argc=1, argv=0xbffff864) at stack0

I used b main command.  So GDB created a breakpoint at the memory address 0x80483fd. Go to the above-disassembled code and find out what is at that address. 

The instruction on this address is mov    DWORD PTR [esp+0x5c],0x0. So GDB has skipped the following instructions.

0x080483f4 <main+0>:    push   ebp
0x080483f5 <main+1>:    mov    ebp,esp
0x080483f7 <main+3>:    and    esp,0xfffffff0
0x080483fa <main+6>:    sub    esp,0x60

This because these instructions are related to the function prologue generated by the compiler. The function prologue builds the stack frame of the function. 

When you set a breakpoint with the function name, GDB automatically skips the function prologue. So if you want to see how the stack frame is building, you can use a memory address instead of the function name.

Let's set a breakpoint at the top of the assembly instructions.

(gdb) b *0x080483f4
Breakpoint 2 at 0x80483f4: file stack0/stack0.c, line 6.

Notice the star mark before the memory address.

Examine the memory and the registers

This is the most important part of our reverse engineering task. We can use various ways to examine the memory and the registers to see what is inside them.

Examine registers

To examine registers we must run the program. What we do is set a breakpoint at a required state and run the program. After gdb stops the execution we can examine registers.

Now I have created a breakpoint at main function using b main and started the program. So at the moment, GDB has paused the execution at main function.

We can use info registers command or the shorthand command I r to examine all registers. See the following example.

(gdb) i r
eax            0xbffff864       -1073743772
ecx            0xa9493a07       -1454818809
edx            0x1      1
ebx            0xb7fd7ff4       -1208123404
esp            0xbffff750       0xbffff750
ebp            0xbffff7b8       0xbffff7b8
esi            0x0      0
edi            0x0      0
eip            0x80483fd        0x80483fd <main+9>
eflags         0x200286 [ PF SF IF ID ]
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51

So, guys, GDB listed all registers and their current values.

Also, we can examine a specific register using their name using I r [register_name] command. Lets see what is inside esp register.

(gdb) i r esp
esp            0xbffff750       0xbffff750

We can examine multiple registers at once using the following way.

(gdb) i r esp eip
esp            0xbffff750       0xbffff750
eip            0x80483fd        0x80483fd <main+9>

We know the esp register is pointing to the top of the stack. So by examining the esp register we can find the address of the top of the stack. 

Examine memory addresses

Here we are going to see how we examine a memory address. The command we use is x [memory_address]. 

In above we saw EIP register contains the value 0x80483fd. This should be the memory address of the next instruction that waiting to be executed by the CPU. Let's see what is in that location.

(gdb) x 0x80483fd
0x80483fd <main+9>:     0x5c2444c7

 We can do both the above steps at once. For example, we can get the eip register holds by using $eip. So we can examine what inside of the memory address pointed by eip with the command I r $eip.

(gdb) x $eip
0x80483fd <main+9>:     0x5c2444c7

The examine command can be customized to satisfy our needs. For example, we can specify the data type that gdb prints out. By default GDB print values in hexadecimal format. The command to switch format is x/[format] [memory_address].

The following are some data type formats.

  • x : Hexadecimal format
  • o : Octal format
  • u : Unsigned decimal format
  • t : Binary format
  • d: Decimal format

Most of the time we use binary and decimal types.

(gdb) x/x $eax
0xbffff864:     0xbffff975
(gdb) x/o $eax
0xbffff864:     027777774565
(gdb) x/u $eax
0xbffff864:     3221223797
(gdb) x/t $eax
0xbffff864:     10111111111111111111100101110101
(gdb) x/d $eax
0xbffff864:     -1073743499

Also, there are some special types of formats. If we think there is a character string in memory address we can specify the string format to print raw bytes as a string. GDB automatically converts values to a string.

In the above disassembly, we can see CPU pushes a memory address "0x8048529" to the top of the stack and call [email protected] So we can guess there should be a string at this memory address. Here I examined that address.

(gdb) x/s 0x8048529
0x8048529:       "Try again?"

Next, we can print a CPU instruction by specifying the format i. We know eip register points to CPU instruction. So we can check what is that instruction by using the following command.

(gdb) x/i $eip
0x80483fd <main+9>:     mov    DWORD PTR [esp+0x5c],0x0

We also can specify the number of units to show. By default, gdb shows one unit (A unit is the byte length of a word. The word size of 32-bit architecture is 4 bytes). The syntax to specify unit number is x/[unit_number][format] [memory_address].

Let's examine 20 words from the top of the stack.

(gdb) x/20x $esp
0xbffff750:     0x00000000      0x00000001      0xb7fff8f8      0xb7f0186e
0xbffff760:     0xb7fd7ff4      0xb7ec6165      0xbffff778      0xb7eada75
0xbffff770:     0xb7fd7ff4      0x08049620      0xbffff788      0x080482e8
0xbffff780:     0xb7ff1040      0x08049620      0xbffff7b8      0x08048469
0xbffff790:     0xb7fd8304      0xb7fd7ff4      0x08048450      0xbffff7b8

I think you got an idea about examining the memory.

Running, Continuing, and stepping the execution.

We can start the execution of the program with the command run. If we want to give command line arguments we can supply them after thee run the command as follows.


(gdb) run AAAAAA

Ig there is a breakpoint GDB stops the execution at the specified line. SO if we want to continue the execution we can use the command continue or the shorthand command c.

We also can execute one single CPU instruction at a time using ni command. Actually ni stands for next instruction.

So guys that's all for this tutorial. I think you enjoyed it. Feel free to leave a comment. Thank you for reading.

Jun 13
Build A Simple Web shell

A web shell is a piece of code written to get control over a web server. It is helpful for....

Aug 12
Printing | Python programming

Printing is an absolutely basic part of a programming language. We learned how we can print a....

Aug 12
Linux directory managing

As a Linux user you must master Linux terminal. You should be able to handle files and directories....

Replying to 's comment Cancel reply